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ON THE STRUCTURE OF ASSOCIATIVE MEANING * 


JAMES DEESE 


Johns Hopkins University 


Ever since Hobbes wrote “From St. 
Andrew the mind runneth to St. Peter 
and from St. Peter to Stone,” psycholo- 
gists and philosophers concerned with 
the nature of the mind have been trying 
to describe the nature of the associa- 
tions that are evident in human 
verbal behavior. Locke tried to picture 
the structure of the human mind by 
describing relations between associa- 
He attributed associations to the 
more or less accidental contingencies of 
perceptual qualities in the real world. 

3y so doing, Locke became responsible 
for a major tradition in the theory of 
both thought and perception. 

Almost all of the theoretical attempts 
to deal with association stem from 
Locke. For experimental psychologists 
interested in learning, however, the 
most important theoretical contribu- 
tions come from the so-called secondary 
of introduced by 
Thomas Brown and David Hartley. 
Associations themselves are supposed 
to arise by contiguity, similarity, etc., 
but they occur in the strengths and dis- 
tributions that they do because of their 
frequency, vividness, and so on. It is 
experimental treatment of the sec- 
ondary principles that has led to most 
of the stable empirical generalizations 


so 


tions. 


laws association, 


1 The work reported in this paper was sup- 
ported by funds from the National Science 
Foundation, Grant 13055. 
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about associative learning, though the 
primary “laws” have always been of 
greatest theoretical importance. 

Since Ebbinghaus, the major experi- 
mental treatment of association has 
been with artificial material, mostly 
nonsense syllables. Consequently the 
relations of contiguity, similarity, etc., 
have been arranged by experimenters, 
not obtained from the natural relations 
among linguistic elements. 

Somewhat earlier than Ebbinghaus, 
however, Sir Francis Galton inaugu- 
rated the study of naturally occurring 
For the most part, the 
tradition begun by Galton has remained 
psychometric rather than experimental. 
Recently, these two traditions have been 
combined. Nevertheless, most of the 
attempts to deal with the structure or 
organization of naturally occurring as- 
sociations have been empirical rather 
than theoretical, in the tradition of 
Locke. 

Most of the empirical studies of the 
organization of naturally occurring as- 
sociations have been studies of classifi- 
cation of associations. Associations are 
classified by logical relations or by the 
dictionary meanings of words which 
occur as stimuli or as responses in free 
association tests. 

A good summary of most of the at- 
tempts to discover the structure of asso- 
ciations by classification can be found 


associations. 
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in the first edition of Woodworth’s Ex- 
perimental Psychology (1938). Wood- 
worth both reviews a half century of 
German, American, and English at- 
tempts at classification and presents 
a classification scheme of his own. 
Woodworth’s own classification is more 
obviously rooted in the association fre- 
quencies themselves than are the earlier 
It is also unique in that it is a 
two dimensional classification. ‘Wood- 
worth classifies both by meaning, in the 
traditional sense, and by meaningful- 
ness (in the sense used by Noble and 
others). 

As with the earlier classifications, 
Woodworth’s is a combination of dic- 
tionary and grammar book rules, fil- 
tered through Woodworth’s own verbal 
behavior and the verbal behavior of the 
investigators whose work he summa- 
rizes. The trouble with it and with all 
classifications to date is that they are 
not completely rooted in the association 
They attempt to impose 


ones. 


process itself. 
upon the associations the logical rela- 
tions found in grammars and the rela- 
tions between words stated in diction- 
aries and thesaureses as well as some 
of the relations among objects per- 


ceived in the natural world. Perhaps 
the major result of such classifications 
is to convince us that while associations 
are partly related to the meanings of 
words in all of the above senses, associ- 
ations are meaningful in precisely none 
of the above senses. 

The meaningfulness of words refers 
to organized relations among the words 
and among words and objects in the 
natural world. Associations are re- 
lated to one another in some organized 
fashion and also related in some way to 
perceived properties in the natural 
world. The only conclusion is that 
there is some sense in which one may 
speak of “‘associative meaning” and that 
the way to discover the associative 
meaning of words is not to classify as- 
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sociations according to other senses of 
meaning but to discover the relation- 
ships associations have with one an- 
other. That is the purpose of the 
present paper. 

Aside from the difficulty summarized 
above, all previous attempts at classi- 
fication, including the interesting one 
proposed by Karwoski and Berthold 
(1945), have suffered from another de- 
fect. All schemes proposed thus far 
have attempted to deal simultaneously 
with the stimuli and the responses to 
them. That is to say, they attempt to 
classify the relation between stimuli and 
responses, even though the individual 
members vary. Thus, the result fre- 
quently is an attempt to equate such 
pairs as sLow-Fast and BLUE-Yellow.* 
Apart from the implicit assumptions 
made by collapsing disparate stimulus 
and response terms together, the possi- 
bility is ignored that certain reciprocal 
pairs, for example, worK-Play, PLAy- 
Work, bear no simple consistent rela- 
tion to one another. 

For this reason, it seems more sen- 
sible to simplify the problem by dealing 
with the distribution of responses with 
stimuli held constant or with the distri- 
bution of stimuli for any given response. 
The distributions to different stimuli or 
the stimuli for pairs of responses may 
then be compared. This comparison 
should provide the first step in under- 
standing how associations are related 
to one another. 


RELATIONSHIPS AMONG ASSOCIATES 


For the past several years, the pres- 
ent author has been studying the extent 
to which the relations among associa- 
tions determine such aspects of immedi- 
ate free recall as clustering, importations 
into recall, etc. (Deese, 1959a, 1959b, 
1960). This work had its origin in 

2 In this paper, free association stimuli will 


be given in small capitals and responses in 
italics. 
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the studies of associative clustering 
(Jenkins & Russell, 1952) and category 
clustering ( Bousfield, 1953). The basic 
method has been to relate an index 
based upon the average frequency with 
which all words presented for recall 
tend to elicit one another as free as- 
sociates with the number of words 
correctly recalled, the number of im- 
portations, etc. This index is a power- 
ful predictor of a number of character- 
istics of free recall. Free recall itself 
is a representative verbal process, and 
thus it is evident that the associative in- 
terrelations among words are powerful 
determiners of the pattern and fre- 
quency with which words are emitted. 

The interword associative index is 
computed by tabulating the frequencies 
with which each word in a particular 
collection (usually a list of words to be 
presented for recall) occurs as a re- 
sponse to all of the other words as free 
association stimuli. Casual inspection 
of the matrix that provides the basis for 
such a tabulation gives striking con- 
firmation of what every investigator 
who has studied free association has 
felt to be the case, namely that associa- 
tions exist in well organized and in 
some instances tightly organized net- 
works. The networks are so apparent 
that they turn up even in random col- 
lections of words, if the words are high 
in frequency of usage (Deese, 1960). 

The networks that appear upon such 
a tabulation provide only part of the 
picture, however, because the words 
that appear as responses are artificially 
restricted to words which appear as 
stimuli. When examined, the original 
association frequencies show that stim- 
uli which do not elicit one another as 
responses sometimes have in common 
a great many responses. An example, 
picked more or less at random, is the 
pair PIANO and SYMPHONY. Ina sam- 
ple of Johns Hopkins University under- 
graduates, neither one elicited the other, 
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but both elicited Note, Song, Sound, 
Noise, Music, and Orchestra in varying 
frequencies. 

Thus, the relationship between free 
association stimuli should be deter- 
mined from all of the responses that 
stimuli have in common. If the asso- 
ciative meaning of any stimulus is given 
by the distribution of responses to that 
stimulus, then two stimuli may be said 
to have the same associative meaning 
when the distribution of associates to 
them is identical. Two stimuli overlap 
or resemble one another in associative 
meaning to the extent that they have 
the same distribution of associates. 

One may also deal with the associa- 
tive distribution * of the responses them- 
selves. The associative distribution of 
a response is the collection of stimuli to 
which it occurs. This relation is less 
interesting than that between responses 
to stimuli for a number of reasons. It 
does not, of course, specify an associa- 
tive concept! as does the distribution of 
responses to a stimulus, and it is sub- 
ject to serious sampling limitations. 
One is fairly sure to pick up any rela- 
tions of sttistical significance among 
stimuli by a modest sample of re- 
With the distribution of stim- 
uli problem, however, there is no way 
to insure that one has sampled all stim- 
uli that are high frequency producers of 
a particular response short of exhaust- 
ing the language. Therefore, except 
insofar as the distribution of stimuli 
problem is related to the problem of 
associative meaning, it will be ignored 
in the balance of this paper. 


sponses. 


SPECIFICATION OF RELATIONSHIPS 


The idea of associative meaning pre- 
sented above implies that one describes 


8 The word “distribution” is used here in 
analogy to its linguistic, not its statistical 
sense. That is to say it indicates the number 
of situations to which a particular word 
occurs as a response. 
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and classifies associations purely in 
terms of the relations among the re- 
sponses to different stimuli. There are 
several ways in which such relations 
may be specified. One may count the 
number of associations in common in 
any fixed size sample, or one may 
weigh the common associations by their 
frequency. For prediction of effects in 
verbal behavior, the latter procedure is 
probably more precise, and it is the ap- 
proach taken in the present paper. 
Also, one may wish to describe the 
ways in which stimuli resemble one 
another in associative meaning. This 
implies the study of a collection of re- 
lations between a number of stimuli. 
Factor analysis suggests itself as a way 
of reducing the structure within such a 
collection to manageable proportions. 
In any event, the heuristic value of the 
concept of associative meaning will de- 
pend upon ways of specifying the rela- 
tions between free association stimuli. 


The purpose of this section is to pro- 
vide a basic method for measuring the 
relationship. 


There are several difficulties that 
stand between the definition of associa- 
tive meaning and the specification of 
particular associative meanings by a 
study of the relationships among free 
associations. The first of those is cre- 
ated by the technique of collecting free 
associations, which almost completely 
eliminates the occurrence of the stimu- 
lus word as a response to itself. This 
problem must be dealt with first. 

The first step in an analysis of asso- 
ciations is to tabulate for each stimulus 
word in a collection the 
which the word has in common with 
any other stimulus words in the collec- 
tion. Since, by instructions as well as 
by technique a word usually does not 
occur as a response to itself in free as- 
sociation, the cells in such a tabulation 
in which the same word as response 
and stimulus intersect will be empty. 


responses 
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This creates intuitive difficulties if one 
defines the relation of associative mean- 
ing between words as the common dis- 
tribution of responses, because pairs of 
words often appear to be intuitively re- 
lated when they have no overt responses 
in common. Upon close examination 
all such intuitive cases turn out to be 
cases in which the responses to one 
word are the stimuli to the other. An 
example in the Hopkins data is the 
SOFT-LOUD pair. Both of these words 
elicit one another, but they have no 
other responses in common. Thus, 
they could have no associative meaning 
in common if the possibility of a word 
occurring as a response to itself is 
denied. 

Fortunately, a way out of this para- 
doxical situation already exists in a 
technique independently designed for 
other purposes by Jenkins and Cofer 
(1957) and by Bousfield, Cohen, and 
Whitmarsh (1958). The technique 
consists of assuming that a word elicits 
itself implicitly as an associative 100% 
of the time. This assumed implicit re- 
sponse is called by Bousfield, Cohen, 
and Whitmarsh the “representational” 
response. The implication of the name 
is that this response identifies or repre- 
sents the symbolic stimulus presented 
to the individual. 

Both Jenkins and Cofer (who do not 
explicitly identify the representational 
response) and Bousfield, Cohen, and 
Whitmarsh implicitly make the assump- 
tion that the representational response 
is always the same word as the stimulus 
presented. This assumption is very 
nearly correct for words of common 
cultural use, but it must surely be 
incorrect for unusual words. Many 
Hopkins undergraduates, for example, 
respond to the words ABBESS and AByYss 
in precisely the same way, which sug- 
gests that the representational response 
for one of them is not the dictionary 
equivalent of the word. Nevertheless, 
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this assumption works with only trivial 
exceptions for moderately sophisticated 
individuals and words of ordinary fre- 
quency of usage. It is, therefore, 
adopted here. 

With this assumption it is possible to 
prepare a table like Table 1. This table 
shows the frequency of responses com- 
mon to any two stimuli, with the fre- 
quency of the stimulus word as a 
response to itself entered with 100% 
frequency. Since the N for this table 
is 50, the value entered in the stimulus- 
Note 
that it is not the 
stimulus-response intersections to ap- 
pear. If a particular stimulus word 
never appears as a response to the other 
stimuli, it will not appear as a common 
Such instances are rare, 
however, when the collection of stimuli 


response intersection cells is 50. 


necessary for all 


respt mse. 


is made on some basis that predicts re- 
lated associative meaning. 

The second problem is to find an 
appropriate measure of the relation be- 
tween the responses common to differ- 
ent stimuli. This is the problem of 
finding a measure of the relations be- 
tween the columns in Table 1. The 
most obvious solution is to use product- 
moment correlations as the measure of 
There are, 
however, a number of objections to 


relations between columns. 


such a solution. 

The most important objection, de- 
termined in part by the fact that the 
technique of free limits 
each subject to one response, is that 
the product-moment correlation would 
yield a measure of the correlation of 
distributions of associations rather than 
the extent to which frequencies are in 
common. Often, that are 
high in frequency to one stimulus will 
be low in frequency to another. This 
happens very often because of the 
typically steep rank-frequency distribu- 
tion of free association responses to any 
given stimulus. Because the forms of 


association 


responses 
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the distributions of common responses 
may be inverse to one another, the ob- 
tained product-moment correlations of 
the frequencies will often be negative. 
Thus, a table of product-moment corre- 
lations obtained from Table 1 would 
yield a few zero correlations, a few high 
positive correlations and a large num- 
ber of low negative correlations. 

Since it is not the relation between 
the forms of the distributions of re- 
sponses but the relations between the 
frequencies themselves that is inter- 
esting, a direct measure of overlapping 
frequency is appropriate. To measure 
the commonness of associative meaning, 
we may take the ratio of the sum of the 
overlapping frequencies to any pair of 
stimuli to the maximum possible sum. 
Thus, if we turn to the first two col- 
umns in Table 1, which are the responses 
to MOTH and INSECT respectively, 
we see that the response frequencies in 
common sum to 12. The total possible 
frequency in common is 100; this yields 
a relative common frequency of .12. 
This number is entered into the ap- 
propriate cell in Table 2. 

A simple interpretation of such a 
measure demands some assumptions 
which are not strictly correct. In other 
words, for most problems, this measure 
can be regarded as an approximation. 
It supposes that the distribution of re- 
sponses taken one per individual per 
stimulus word from a homogenous 
population is characteristic of the dis- 
tribution of different 
times in any one representative indi- 
vidual from that population. Thus, the 
assumption is made that the distribution 
of responses obtained from a sample of 
individuals is the same as the dis- 
tribution of associations in any 
individual. 

This is an assumption shared by most 
students of free associations in popula- 
tion samples (and one only occasionally 
explicitly stated). An experimental 


associations at 


one 
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comparison of repeated testing vs. 


single testing is a badly needed addition 
to the association literature, and it may 


well lead to a more satisfactory way of 
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counting instances of commonness of 
association. 

This measure of the relation in asso- 
ciative meaning between two words can 


rABLE 1 


FREQUENCIES OF ASSOCIATES IN COMMON TO 19 WorDs 
BASED ON RESPONSES OF 50 SUBJECTS 


RAW 


Stimulus words*® 


1 Moth 
Insect 
Wing 
Bird 
Fly 
Yellow 
Flower 
Bug 
Cocoon 
Color 
Blue 
Bees 
Summer 
Suns h ime 
Garden 
Sky 
Nature 
S pr im £ 
Butterfly 
Light 
Ant 
Bright 
Feather 
Flight 
Tree 


Winter 
Warm 
Plant 
Gray 
Brown 
Vacation 


* The numbers of the stimulus words correspond to the first 19 response words. 
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TABLE 2 


OVERLAP COEFFICIENTS FOR COMMON ASSOCIATES BETWEEN 


rHE 19 Worps IN TABLE 1 


(Decimals Omitted) 


Stimulus 
words 


MOTH 
INSECT 
WIN |: 
BIRD 
FLY 
YELLOW 
FLOWER 
BUG 
COCOON 
COLOR 
BLUE 
BEES 
SUMMER | 
SUNSHINE 
GARDEN 
SKY | 
NATURE | 
SPRING | 
BI a 


vary from .00 to 1.00. The latter value 
would happen, however, only if the two 
stimuli produce completely reciprocal 
responding. Thus, complete identity in 
associative meaning can occur by this 


measure only if two stimulus words 
always elicit each other and no other 
This is a consequence of the 
assumption of the frequency of the rep- 


words. 


resentational response. Thus, if a pair 
of stimuli had distributions of responses 
exactly alike except that they never 
elicited each other, the maximum rela- 
tive frequency in common would be .5. 
Such a result appears at first to be a 
very undesirable property of the meas- 
ure of relation in associative meaning. 
However, it turns out, as will be 
pointed out later, that this property is 
extremely important in generating some 
useful consequences of the concept of 
associative meaning. 

The relative overlap coefficients for 
all pairs of stimuli within a set produces 
a matrix of the sort found in Table 2. 
The only problem remaining is to fill 
in the principal diagonal, or the com- 
munalities in an ordinary table of corre- 
lations. In keeping with the assump- 
tions made above about the assessment 


Stimulus words 


of associative meaning, the proper value 
to enter into these cells is 1.0, since any 
word always has the same distribution 
of response frequencies as itself. This 
is like the assumption introduced by 
Osgood and Suci (1952) into their 
factorial portrayal of the semantic 
differential. 
ANALYSIS OF ASSOCIATIVE MEANING 
3efore examining some of the impli- 
cations of this notion of associative 
meaning, it would be helpful to look at 
some sample data showing some of the 
relations that can exist between the 
associative meanings of different words. 
It would be possible, of course, to 
apply the present analysis to random 
collections of words, but because the 
overlap in associative meaning between 
words taken at random is not high, 
such an application would not be re- 
warding at this stage. In general, an 
analysis of associative meaning would 
be conducted on a set of stimulus words 
selected because we would be interested 
in discovering their relationships and 
because we would suspect, from prior 
information, that they were associa- 
tively related. 
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The data presented here illustrating 
some of the possible relations in asso- 
ciative meaning are not psychologically 
interesting, with one exception. They 
were, however, selected in such a way 
as to insure some degree of related 
associative meaning; thus they illus- 
trate fairly well some of the analytic 
possibilities. 

The sets of stimulus words in Tables 
3, 4, and 5 have internal structure for 
the simple reason that the stimulus 
words themselves are all responses to 
one of the stimulus words by the 
Minnesota norms (Russell & Jenkins, 
1954). For example, the words in 
Table 3 include BUTTERFLY and words 
which are responses to BUTTERFLY in 
the Minnesota norms. 

These data are based upon samples 
of 50 subjects each, which are a bare 
minimum for any kind of reliable an- 
alysis and too small for anything but 
the roughest kind of prediction. They 
serve, however, to illustrate how sets of 
words are related to one another and 
what the factor structure of associations 
is like. The subjects from which these 
associations were obtained are under- 
graduate students (male) at the Johns 
Hopkins University. 

The basic data from which the fac- 
tors in Table 3 are derived are in Table 
2. These are relative common frequen- 
cies of responses to 19 words in the 
BUTTERFLY set. These relative fre- 
quencies were factored by the centroid 
method, and after a very few rotations 
(pairs of factors were rotated through 
approximately 45 degrees) the factor 
loadings presented in Table 3 were ob- 
The factor loadings in Table 4, 
words obtained 


tained. 


based upon stimulus 
from Music, and Table 5, based upon 
stimulus obtained from sLow 
were likewise arrived at by centroid 
analysis and very simple rotation. 
Since the major emphasis in these 


words 
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sample data is on the factor presenta 
tion, a word needs to be said about 
factor analysis in this context. Factors, 
of course, depend upon the variables 
used to obtain them. In testing theory 
this is a sometimes overlooked restric- 
tion if for no other reason than that test 
theorists are often convinced that the 
batteries they factor are in some sense 
“representative” of all the set of abili- 
ties they wish to factor. In the analysis 
of words, however, this is not the case, 
so we are forced to an awareness of the 
limitations of factor extraction. The 
words that go into any matrix are only 
a small part of those that might possibly 
have been used. Thus, the results of 
the factor analysis are properties of the 


limited set, and we cannot talk about 
the factor loadings for any one word, 
for example, as if these exhausted all of 
the associative structure of that word. 
As it turns out, this is a very useful 
property of the analysis of associative 


relations among words, but it tends to 
make trivial collections of words, such 
as those in Tables 3, 4, and 5 less inter- 
esting in a factor analysis than would 
words with 
structure to be 


be collections of some 
suspected important 
uncovered. 

The first thing to notice about the 
factor loadings is the convenient simple 
structure in the factors. In Table 3, 
approximately half of the words have 
nearly zero loadings on Factor I, while 
the other half of the words have posi- 
tive loadings of about the same magni- 
tude. Factor I that 
have approximately zero loadings have 
positive loadings on Factor II, and the 
words with positive loadings on I have 
nearly zero loadings on II. Thus, the 
first two factors effectively sort the 
stimulus words into two classes. 

This kind of initial classification is a 
property of the kind of distribution of 
common frequencies found in Table 2, 


Those words on 
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and since this distribution is typical of 
all of the 15 or so matrices thus far 
examined by the author, it will probably 
occur in all centroid analyses of words. 
The same initial classification 
in Tables 4 and 5, and it also occurs in 
Table 6 for which the stimulus words 
were not selected for associative rela- 
tions but because they centered about a 
particular attitude or value system. 

An inspection of the loadings of Fac- 
tor I in Table 3 shows that it occurs in 
words having to do with animate crea- 
tion (BEES, FLY, BUG, WING, BIRD, etc. ) 
whereas the words with loadings on 
Factor II are those words in the set 
that do not have to do with animate 
things (SKY, YELLOW, NATURE, 
It would be a mistake, however, to fall 
into the practice of naming factors de- 
rived from associative distributions, for 
inevitably we fall into the attempt to 


occurs 


tc. J. 


TABLE 3 


RorarepD CENTROID FactoR LOADINGS Of 
STIMULUS OVERLAP COFFFICIENTS 
PRESENTED IN TABLE 2 
(Decimals Omitted) 


Factors 


Moth 
Insect 
Wing 
Bird 
Fly 
Yellow 
Flower 
Bug 
Cocoon 
Color 
Blue 


|—02 
}—02 
Bees 36 
~01 
02 
00 


Summer 
Sunshine 
Garden 
Sky 
Nature 04 
Spring |—O1 
Jutterfly 48 


-(01 
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impose a logical structure on the asso- 
ciations, and one thing that is clear is 
that associative meaning is not logical. 
Nevertheless, it is clear that the first 
two factors have split the stimulus 
words into the most obvious and basic 
structure organizing them. 

Factor III in Table 3 shows zero 
loadings on the nonanimate words 
again (except for NATURE, which ap- 
parently comes close to straddling the 
basic division of words in Table 3), 
and it splits the animate words into a 
bipolar factor. The positive loadings 
are ON WING, BIRDS, BEES, FLY and the 
negative loadings on BUG, COCOON, 
MOTH, BUTTERFLY (which probably oc- 
curs here because of its strong relation 
to BUG and cocoon). Factor IV makes 
a bipolar split of the nonanimate words. 
SUMMER, SUNSHINE, GARDEN, FLOWER, 
and SPRING go together while BLUE, 
SKY, YELLOW, and COLOR go together. 

It is instructive to follow the factor 
profiles for pairs of words within a set. 
This allows us one of several ways of 
preparing an associative thesaurus, in 
this case a pair thesaurus. Thus, we 
may say that BLUE and YELLOW are 
both alike within this set in that they 
share common loadings with sky, suM- 
MER, SUNSHINE, COLOR, etc. On one 
factor, however, they diverge; this pro- 
vides the critical difference between 
BLUE and YELLow within this set of 
words. BLUE goes with SKY, BUTTER- 
FLY, WING, and BIRD, while YELLOW 
goes with INSECT, FLY, BUG, and BEES. 
This, of course, does not exhaust the 
contrasts in associative meaning be- 
tween BLUE and YELLOw, but it does 
suggest one way in which they differ. 

Table 4 presents the first six factors 
from a centroid analysis of the matrix 
of stimulus words which are themselves 
Again, the first 
two factors yield a basic split. Factor I 
has loadings on TONE, INSTRUMENT, 


responses tO MUSIC, 
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rABLE 4 


RoTAtep CENTROID FACTOR LOADINGS OF 
STIMULUS OVERLAP COEFFICIENTS FOR 
17 Worps DERIVED FROM MUSIC 
(Decimals Omitted) 


Factors 
Words 


IV 


wo 
-— ww 


40 
01 
01 
02 
03 
00 

—26 
01 


lone 
Instrument 
Symphony 
Sing 

Note 

song 
Sound 
Piano 
Noise 06 
Band : —(O4 
Horn = 
Loud = 42 
Hear 49 
Opera 5 01 


Ear - 52 


| | 
ww 
“Iw Ww 


w~zaowsd & w 


Music 58 5 01 
Soft 23 -01 00 


SYMPHONY, SING, NOTE, SONG, PIANO, 
BAND, HORN, OPERA, and Music, while 
Factor II has loadings on SOUND, 
NOISE, LOUD, EAR, and HEAR (this is a 
less clean split than the previous table 
because of the small loading of TON! 
on Factor I). As with the previous 
table, Factors III through VI are bi- 
polar. Again, interesting contrasts may 
be seen in factor profiles. OPERA and 
syMPHONY, for example, have parallel 
profiles except on Factor ITI where they 
are bipolar opposites. Here, OPERA 
goes with SONG, NOTE, and SING, while 
SYMPHONY with INSTRUMENT, 
PIANO, BAND, and MUSIC. 


roes 
goes 


Table 5 exhibits exactly the same sort 
of pattern with the words centering 
sLow and so will not be 
commented on further. This table 
illustrates an effect to be found in asso- 
ciations in adults, the close relations of 
The pair sLow-Fast yield 


around 


opposites. 


an overlap index of .71 for the Hopkins 
population, the highest for any of the 
words in Tables 3, 4, and 5. For the 
six factors extracted, the factor profiles 
for these two words are practically 
identical. This means that there is not 
enough contrast provided within the 
matrix of words to separate these words 
within the arbitrary limit of six factors. 
This is in contrast with the pair sort- 
Loup, which separates almost immedi- 
ately in the Music table. The presence 
of NOISE, HEAR, HORN, etc., managed to 
pull sort and Loup apart by providing 
some associative meaning overlapping 
with LouD but not with sort. 

Because of the 


close relation be- 


tween SLOW and FAST, it may be very 
difficult to provide sets which will con- 
trast these in associative meaning. They 
are, as are most of the high frequency 
opposites, adjectives, and they are de- 
scriptive of a common state modified 
only in a kind of secondary way. That 


TABLE 5 


ROTATED CENTROID LOADING OF STIMULUS 
OVERLAP COEFFICIENTS OF 16 Worps 
DERIVED FROM SLOW 
(Decimals Omitted) 


Factors 


Slow 
Walk 
Speed 
Quick 
Lazy 
Drive 
Skid 
Run 
Work 
Fast 
Down 
Stop 
Snail 
Sign 
Poke 
Traffic 
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is to say, both words imply motion; 
they both describe the rapidity of mo- 
tion, and they contrast only in the kind 
of rapidity. The way in which sort 
and Loup are alike, however, provides 
only part of the distribution for sort, 
since it exists also in a close relation 
with the word HARD. HARD and LOUD 
are related through sort but otherwise 
independent. 

Table 6 illustrates the factor analysis 
of a set of stimulus words which are 


psychologically more interesting. These 


words are taken from a larger study, to 
be presented elsewhere, which is con- 
cerned with the factor structure of asso- 
ciative meaning of critical words for 
subjects scoring high and subjects scor- 
ing low in each of the value scales from 
the Allport, Vernon, and Lindzey 
(1960) study of values inventory. The 
critical words for each value scale were 
obtained from the actual items scored 
for that value scale; thus the words in 
Table 6 are from the Religious Value 
items. 
TABLE 6 


ROTATED CENTROID LOADINGS OF STIMULUS 
OVERLAP COEFFICIENTS FOR 15 Worps 
CENTERING IN RELIGIOUS VALUI 


Sermon 
Clergyman 
Religion 
Service 
Worship 
Reverence 
Prayer 
Soul 

Spirit 
Divine 
Faith 
Inspire 
Devotion | 
Love | 
Hope 
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Completed data are available at pres- 
ent only on high scorers (the distribu- 
tion of scores in the Hopkins population 
is skewed). Thus, the factors in Table 
6 are based upon 50 Hopkins under- 
graduates who scored at or above one 
probable error in the published norms 
for men on religious value. The first 
thing to notice about Table 6 is that it 
is very much the same as Tables 3, 4, 
and 5. The initial classification in this 
case sorts the words into a group in- 
cluding DIVINE, SPIRIT, PRAYER, WOR- 
SHIP, REVERENCE, RELIGION, FAITH, 
and DEVOTION and into a group includ- 
ing HOPE, SOUL, LOVE, INSPIRE, SERVICE, 
SERMON, and CLERGYMAN. One is al- 
most tempted to say that the classifica- 
tion divides the words into an institu- 
tional, theological class and a human 
individual class. 


IM PLICATIONS OF ASSOCIATIVE 
MEANING 


readers will have 
recognized the notions presented here 
as extensions of those stated by Bous- 
field, Cohen, and Whitmarsh (1958), 
and since these authors regard their 
extension of Noble’s 
(1952), by implication the present 
work is also. 

Thus far we have asserted that the 
distribution of responses to any free 
association stimulus * forms the associ- 
ative meaning of the stimulus word and 
thus an associative concept named by 
that word. If the associative meaning 
of a word (as opposed to any particu- 
lar stimulus-response pair) is to have 
any use, it must predict something 
about verbal behavior. In this section 


By now, some 


work as an 


4 Free association stimuli are not limited 
to words, of course, and the ideas presented 
here can be in theory, extended to nonverbal 
stimuli, but the lack of a clear criterion for 
the representational response may produce 
difficulties in practice. 
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we will discuss some of the character- 
istics of verbal behavior that may be 
predictable from the associative mean- 
ings of words and from collections of 
associative meanings. 

Associative meaning will not predict 
the tendency of words to elicit one an- 
other ; that is predicted by the free as- 
sociation frequencies of particular pairs 
or by some general measure of pair re- 
lation (such as the interword associa- 
tive index). Associative meaning, in 
general, should predict the words that 
will occur in the verbal environment of 
a particular word. This statement en- 
tails the assumption that words are 
used, subject to certain other con- 
straints, in particular environments be- 
cause of the distribution of associations 
Thus, the distribution of 
character of 


they possess. 


associates takes on the 


mediation determining the use of par- 
ticular words. 
Thus, if a particular word appears in 


some verbal environment, a word close 
to it in associative meaning should ap- 
pear in the same environment. Gener- 
ally, the closer the relation, the higher 
the probability of the two words appear- 
ing in the same environment. The 
words may appear in the same environ- 
ment in two ways: (a) as substitutes for 
one another, or () as part of one an- 
other’s environment. It is not a part of 
associative meaning to predict which of 
these two ways will appear; though 
such a prediction should arise out of 
theory about the origins of associations. 

The first kind of environmental rela 
tion (that of substitution) is illustrated 
quite clearly by the highly related 
SLOW-FAST pair. The environment that 
produces Slow, all other things equal, 
is very likely to produce Fast also, but 
not nearly so likely to produce diction- 
ary or thesaurus equivalents of Fast. 
Thus, we are likely to say “this is a 
slow train” or “this is a fast train,” but 


we are unlikely to say “this is a rapid 
train” or “this is a sluggish train.” 

The second kind of environmental 
relation (that of coordination) is illus- 
trated by the SKY-BLUE pair. Given the 
frame, “he into the 
sky,” a highly likely substitution in the 
second blank is “blue.” All words in 
English seem to be able to produce both 
kinds of environmental relation, but for 
many one or the other type 
predominates. Thus, while “the blue” 
may appear as a substitution for “sky,” 
such useage is poetic and rare. 

One problem is produced by the ap- 
parent assymetry of substitution in par- 
ticular frames. It is possible, indeed 
probable, that the frame, “he —————_ 
into the blue —____,”' will not pro- 
duce “sky” with the same frequency 
that “blue” is produced in its comple- 
mentary frame. Particular test frames 
have an influence, and it is possible 
that substitution frequencies may be 
symmetrical when averaged over a rep- 
resentative group of test frames. Such 
symmetry is a property of the definition 
meaning, though the 
property cannot be properly tested, 
any failure of symmetry may 
easily be attributed to a lack of ade- 
quate sampling. 

Earlier it was pointed out that one of 
the consequences of the assumption of 
the representational response was that 
perfect agreement in associative mean- 
ing between two words would occur 
only if they were perfectly reciprocal, 
that is to say, if they were the only 
associates of each other. While this 
seemed at first to be an undesirable 
property, it does conform with the 
limiting case of substitution. For, if 
two words are perfectly correlated in 
associative meaning, then they ought to 
be the only substitutes for one another 
in the substitution relation. Notice 
that this situation would not theoreti- 


words 


of associative 


since 
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cally prevent these words from being 
part of the associative meaning of a 
third word. Thus Word A could elicit 
only B as a free associate and B elicit 
A, which would make them overlap 
completely in associative meaning, but 
both A and B could be free association 
responses to a third Word C. Obvi- 
ously this situation would not happen 
in any real language, but an approxi- 
mation to it frequently occurs. The 
general rule for such a case is that the 
reciprocal pair of words are both of 
high frequency of usage, while the third 
word, which elicits them both but never 
occurs as a response to either, is a 
word of very low frequency of usage. 
An example in the Hopkins data is 
the overlapped pair WOMAN and GIRL, 
neither of which elicit Virginal, but 
both of which are elicited as responses 
by VIRGINAL. 

The use of test frames to validate the 
notion of associative meaning will be 
difficult in practice, since the influence 
of grammatical constraints, etc., needs 
to be removed. Furthermore, we do 
not know enough about the problem of 
compounding to know whether or not 
compound effects can be derived from 
the concept of associative meaning it- 
Nevertheless, whatever other in- 
teresting effects may be achieved, the 
ultimate validation of the concept of 
associative meaning must come from its 
ability to predict substitutions in the 
flow of language. 

The most important use of the notion 
of associative meaning is to study the 
structure of sets of words having intrin- 
sic psychological interest. The number 
of such sets is very large, and indeed is 
only limited by the ideas psychologists 
have about the influence of psychologi- 
cal variables upon the use of words. 

An obvious application is to any col- 
lection of words as metaphorical sym- 
bols. Those words (or concepts) 


self. 
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which by psychoanalytic theory, for 
example, are supposed to symbolize the 
phallus, the female figure, hostility, etc., 
ought to have their common symbolic 
character revealed in the interrelations 
and factor structure of their free asso- 
ciates. This is not to say that phallic 
symbol words ought to reveal a com- 
mon core of phallic meaning (by dic- 
tionary criteria) but that they ought to 
converge on some common factor. Ina 
word, no matter how remote their dic- 
tionary meanings and references in the 
real world, they should reveal a com- 
mon associative meaning. If they do 
not, it is hard to see how the particu- 
lar psychoanalytic symbol investigated 
could have any general validity for the 
population tested. 

Currently, the present author is 
studying the structure of associative 
meaning within a set of words that 
were selected from Rorschach protocols 
as indicators of aggression. No de- 
tailed account of these data can be 
given at present, but a striking qualita- 
tive aspect of them is the extent to 
which the overlap in associative mean- 
ing among them is mediated by a small 
number of response words, many of 
which appear out of context to be rela- 
tively neutral (Stick, for example). 

The technique may be used also to 
study the verbal organization associated 
with attitudes. Table 6 presents the 
factor structure of associations to words 
centering about religious values for in- 
dividuals who score high in religious 
value. A complete set of data is not 
yet available for low scorers, but it is 
almost certain that the most important 
difference between the high and low 
scorers will be in the extent to which 
the associations to these words are well 
organized. The factor loadings for low 
scorers will almost certainly be lower ; 
indeed, many of the stimulus words 
(SERVICE, LOVE, HOPE, SPIRIT) will all 
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but disappear from the factor structure, 
and it is very likely that the factor 
structure of the remaining words will 
be very different. 


ORIGIN OF ASSOCIATIONS AND 
SUMMARY 


The last question we face concerns 
the origin of associations. The view 
of Locke that attributed them to the 
accidental contingencies of nature has 
led to a kind of mosaic theory of mind, 
a theory that has scarcely ever had a 
serious rival on its own ground, though 
it has often been challenged. 

No real substitute for the contiguity- 
similarity postulate is offered here. The 
highly organized economy of associa- 
tive meaning has impressed the author, 
however, and it was a belief that the 
human mind derived associations from 
categories of its own that sent him on 
the search for a technique by which to 
Thus, the 


study associative meaning. 
least that can be offered is the sug- 
gestion that associations derive in whole 
or part from the structures or cate- 


gories of the human mind. This belief 
is probably the mover of attempts, such 
as Woodworth’s, to classify associa- 
tions. Such attempts are fruitless, 
however, for if there are categories 
in association, they are categories of 
association, not categories of subordi- 
nation, coordination, etc. This is ad- 
mittedly the belief that motivated the 
present work, 

Thus, the implicit distributions of re- 
sponses that define associative meaning 
may exhibit patterns of overlap because 
they are derived from simple structures. 
An adult may use the word BUTTERFLY 
to correspond to a class of living crea- 
tures that fly and are bugs. This 
doesn’t however, exhaust the associa- 
tive meaning of BUTTERFLY, for most 
flying bugs are dirty, unpleasant and 
YELLOW, while BUTTERFLIES are pretty, 
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like BIRDS and flutter in the BLUE SKY. 
Thus, another associative category in- 
trudes on the meaning of BUTTERLLY 
and makes it an associative concept not 
quite the same as any other. Contrary 
to zoology, associative BUTTERFLIES are 
as closely related to the BIRDs as to the 
MOTHS. 

Large collections of words ought to 
enable us to discover the categories of 
association, if they exist, and how they 
change with age, if they do. In any 
event, an alternative to the contiguity- 
similarity tradition in association theory 
can come about only by a study of the 
associations themselves, and the pur- 
pose of this paper is to present a 
somewhat different approach to the 
study of such associations. That ap- 
proach consists in the assumption that 
the distribution of to any 
word provides the associative meaning 
of that word and the techniques neces- 
sary to apply this assumption to data. 


associates 
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CONTIGUITY AND REINFORCEMENT IN RELATION 
TO CS—UCS INTERVALS IN CLASSICAL 
AVERSIVE CONDITIONING ’* 


JOAN E. JONES? 


University of Sydney 


It is generally accepted that, in clas- 
sical aversive conditioning, the best re- 
sults are obtained if there is a temporal 
interval of approximately 450 milli- 
seconds between the presentation of the 
conditioned stimulus (CS) and the 
uncondtiioned stimulus (UCS). This 
observation creates a problem for the- 
orists who adopt the common sense 
notion of contiguity as a basis for the 
association of events in classical condi- 
tioning. Such theorists formulate the 
problem as one of explaining why 
separation of CS and UCS is superior 
to contiguity and of why an interval of 
450 milliseconds is better than others. 


Because the problem has been formu- 
lated in this way, the theoretical con- 
structs proposed are rather arbitrary 
and have not been subjected to experi- 


mental tests. For example, Hull 
(1952) proposes a theoretical construct 
which bridges the temporal gap be- 
tween the stimulation of a receptor by 
the CS and the evocation of the un- 
conditioned response (UCR), namely 
the molar stimulus trace. The same 
function is performed by Guthrie’s 
(1935) concept of proprioceptive 
stimulation, though he does not explain 
why certain trace conditions are actu- 
ally superior to the simultaneous pres- 


1 Based on a thesis submitted in partial 
fulfilment of the requirements for the PhD 
degree, University of Sydney, 1959, super- 
vised by R. A. Champion. The author ex- 
presses appreciation to R. B. Bromiley and 
M. Humphries for their helpful criticisms 
of a draft of this article. 

2 Now at Defence Research Medical Labo- 
ratories, Toronto. 


entation of CS and UCS. Cognitive 
theorists invoke the perceptual princi- 
ples governing grouping, particularly 
temporal proximity (e.g., Tolman, 
1949). Hilgard and Marquis (1940) 
interpret the classical situation in terms 
of reinforcement rather than of the 
contiguity of associated events, basing 
their interpretation on the observation 
that the most favorable interval ap- 
pears to be somewhat greater than the 
latency of the conditioned response 
(CR). 

Actually there is less agreement in 
the literature on optimum intervals 
than the usual textbook statement im- 
plies. Experimental evidence on the 
effect of the CS-UCS interval in con- 
ditioning human subjects is summar- 
ized in Table 1. In the experiments 
cited the classical defense method was 
used with the exception of Wolfle’s 
where it appears that avoidance was 
possible. In considering Table 1 it 
should be noted that each of the stud- 
ies reported has observed the acquisi- 
tion of only one response. In com- 
paring any two studies it must be 
recognized that many experimental 
conditions are varied even where dif- 
ferent experimenters have employed 
the “same” response. The most im- 
portant of these is the experimenter’s 
original choice of intervals which may 
impose severe limitation on the opti- 
mum obtained. The range and the 
spacing of the chosen values have con- 
tributed to both artifactual similarities 
and dissimilarities in optima reported 
by different investigators. From Table 
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TABLE 


EXPERIMENTAL STUDIES OF 


Response Author | 


McAllister (1953) 
Bernstein (1934) 


Eyeblink 


Hansche and Grant (1960) 


Kimble (1947) 
| Reynolds (1945) 
| Myers (1950) 


Finger with- | Wolfle (1932) 


drawal 
Wolfle (1930) 


Spooner and Kellogg (1947 


| 
| 
| 
) | 
| 


| 
Pupillary | Gerall and Woodward (1958) 
dilation 


GSR Moeller (1954) 


White and Schlosberg (1952)} 0, 
500, 600, 700, 800, 920, 1040, 


: : | 
Wickens, Gehman, and | 
Sullivan (1959) 
Jones (1959) 
| 


| Jones (1959 unpublished) 5( 


Bierbaum (1959) 


*in milliseconds 


1 it will be seen that all the studies 
employing short latency responses 
(eyeblink and finger withdrawal) re- 
port optima of 500 milliseconds, or 
less, and as short as 250 milliseconds, 
while the studies in which responses 
of longer latency have been used 
(pupillary dilation and GSR) yield 
optima of 450 milliseconds or more, 
even long 3000 milliseconds. 
The difference is in the direction ex- 
pected by Hilgard and Marquis but 
the optima reported in five of the six 
GSR studies are well short of the 
latency of that response. 

This paper presents an interpreta- 
tion of the effect of the CS—UCS inter- 
val in terms of the contiguity of the 
CS and the UCR and of reinforcement. 


as as 


| 100, 250, 450, 700, 2500 


| 190, 390, 590, 790 


100, 200, 225, 2 
| 250, 450, 1150, 2250 
| 500, 1000, 1500, 2000 


—500, —250, 0, 500, 1000, 


25 


| > 


| 240, 550, 740, 1060, 1575 


ENT IN AVERSIVE CONDITIONING 


1 


CS-UCS INTERVAL EFFECTS 


Intervals sampled* Optimum* 


250 
300 


900, —500, 100, 200, 250, 

300, 500, 1000, 1480 

390 (training and 
extinction) 

400 

450 

500 


| 
50, 300, 400 | 
2 | 
| 


300 (avoidance 
: possible) 


2000, —1000, —600, 

— 200, 0, 200, 300, 400, 600 
1000, 2000, 3000 

500, —250, 0, 250, 500, 


| 500 (avoidance 
750, 1000, 1250, 1500 


possible) 
500 
1500 


0, 500, 1500, 2500 1500 


450 

500 (extinction) 

500 & 1500 (two 
optima) 

550 (training) 

1060 (extinction) 

1000 

3000 


0, 
25( 


0, 1000, 2500 


15 
), 500, 1000, 2000, 4000 


1550, 2500, 4600 





0, 1000 
3000, 0, 500, 3000, 5000 


This interpretation does not begin with 
the assumption of an invariant opti- 
mum interval but accepts the empiri- 
cally demonstrated fact that no single 
optimum exists. The propositions to 
be outlined differ from existing two- 
process theories of conditioning. 
Whereas Skinner (1935) and Schlos- 
berg (1937) invoke one principle 
(contiguity) in classical conditioning 
and another (reinforcement) in instru- 
mental conditioning, the present theory 
invokes the action of both principles 
in classical conditioning. It differs 
from Mowrer’s (1947) two-factor the- 
ory of classical conditioning in that it 
contains no reference to intervening 
emotional reactions and no distinction 
is made between the principles govern- 
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ing the conditioning of responses on 
the basis of their neurophysiological 
characteristics. 


A Two-PrincipLE THEORY OF 
CLASSICAL CONDITIONING 

When the CS-UCS interval is 
varied two temporal relationships are 
affected. One is the interval between 
the CS and the UCR, pertaining to 
the principle of S—R contiguity. The 
other is the interval between the CR 
and the UCS, which defines the delay 
of reinforcement. These changes are 
effected at the same time but may 
differ in extent and/or direction. 

The following assumptions are made 
about the operation of S—R contiguity 
and of reinforcement in classical 
conditioning : 

S-R Contiguity. The optimum con- 
dition for the establishment of a new 
S-R connection is that the CS and 
the UCR be contiguous in time. The 
greater the degree of temporal separa- 
tion of these events (in either direction ) 
the weaker the resulting connection. 

Reinforcement. The optimum con- 
dition for the strengthening of S—R 
connections is that the CR coincides 
with a reinforcing state of affairs. 
The greater the degree of temporal 
displacement of the reinforcing event 
and the CR (in either direction) the 
less effective the reinforcement. 

The first statement modifies the 
strict contiguity principle maintained, 
for example, by Hull and by Guthrie 
by suggesting that a gradient operates 
both forward and backward in time. 
It is reasonable to suppose that the 
difference between connections based 
on strict contiguity and those based on 
lesser or greater degrees of discontigu- 
ity might differ quantitatively rather 
than qualitatively. The gradient can 
be used in this connection though the 
mechanism underlying it may not be 
specifiable at present. The question of 


Joan E. Jones 


whether the gradient of reinforcement 
is unidirectional or bidirectional is a 
controversial one. The problem is 
complicated by conflicting experi- 
mental evidence and by a confusion of 
empirical and theoretical statements. 
Neither of the statements made above 
is meant to be an empirical generali- 
zation; as assumptions they may be 
justified by their usefulness in 
prediction. 

Historically, the delay of reinforce- 
ment principle belongs to instrumental 
learning where in some situations the 
experimenter can vary the period of 
delay by withholding the reward for 
any predetermined period of time after 
the subject makes the required re- 
sponse. He cannot do so in classical 
conditioning where, by definition, the 
UCS is presented in a fixed temporal 
relationship to the CS regardless of 
whether or when, the CR is evoked. 
The fact that the delay is not under 
direct experimental control does not, 
however, exclude the possibility that 
the delay of reinforcement principle 
applies in classical conditioning. Hil- 
gard and Marquis (1940) recognize 
this and interpret classical condition- 
ing in reinforcement terms but require 
that the CR be a preparatory reaction 
in which case the effect must be 
unidirectional. 

The present interpretation is less 
specific than that of Hilgard and Mar- 
quis. It would seem best at present 
to follow Spence in defining a reinforc- 
ing state of affairs empirically as be- 
longing to that class of “environmental 
events exhibiting the property of in- 
creasing the probability of occurrence 
of responses they accompany” (Spence, 
1956, p. 32). If the reinforcing event 
is defined in this way, the CR does 
not have to be the direct means of 
bringing it about for strengthening to 
occur but may be facilitated if a re- 
inforcing event occurs in that temporal 
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neighborhood. Just what event in 
classical conditioning comprises a re- 
inforcing state of affairs is a matter 
of discovery. However, because the 
UCS is the only event which is sys- 
tematically associated with the S-R 
sequence it is reasonable to suppose 
that some feature of the UCS presenta- 
tion is the reinforcer, most likely its 
onset or its offset. In conditioning 
fear reactions data presented by Mow- 
rer and Solomon (1954) indicate that 
UCS onset is the critical event. 
Where the UCS is of constant (and 
short) duration either assumption will 
lead to essentially the same prediction 
and in the following discussion UCS 
onset is treated as the reinforcing 
event. 

It is proposed to adopt tentatively 
the assumption of Hull (1952), Guth- 
rie (1935), and of Hilgard and Mar- 
quis (1940) that the occurrence of a 
response at an effector is the event 
whose temporal relations to other 
events is crucial in establishing learned 
connections. While it is recognized 
that the latency of the UCR often dif- 
fers from the latency of the CR and 
that both vary from trial to trial, for 


purposes of the present discussion the 
two will be treated as if equal and 
constant. 

Consider the application of the the- 


ory to the limiting case, i.e., where the 
response latency is zero. If the CS— 
UCS interval is zero, the UCR starts 
at the onset of the CS and the CR 
starts with UCS Therefore, 
simultaneous presentation of CS and 
UCS is most satisfactory as then both 
the gradient of contiguity and the 
gradient of reinforcement are at maxi- 
mum strength. If the interstimulus 
interval is lengthened in either the for- 
ward or backward direction then both 
the relevant intervals are lengthened 
and conditioning is poorer. 

In practice the response latency (+) 


onset. 
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is always greater than zero. In this 
case the CS—UCS interval which most 
favors contiguity between CS and 
UCR is — x (that is, backward pres- 
entation where UCS—CS =). How- 
ever, if that is so, the CR is sepa- 
rated from UCS onset, the reinforcing 
state of affairs, by an interval of 2x, 
so that reinforcement is_ relatively 
weak. Strongest reinforcement will 
occur where the CS—UCS interval 
equals + x, but then the CS and UCR 
are separated by an interval of 2. dura- 
tion. So, if the interstimulus interval 
is such as to favor the effect of either 
the gradient of contiguity or of re- 
inforcement, it necessarily follows that 
the other is operating at less than 
maximum efficiency. The longer the 
latency of the response the greater the 
relative disadvantage to the nonfavored 
mechanism. 

In predicting the optimum interval 
the combination of overlapping gradi- 
ents must therefore be considered. 
The predictions made will differ ac- 
cording to whether the manner of their 
combination is assumed to be additive 
or multiplicative. The multiplicative 
relationship assumes that both the 
gradients must be at greater than zero 
strength for conditioning to occur. 
That is, neither alone is sufficient for 
the establishing and/or strengthening 
of conditioned responses; both must 
operate in conjunction throughout the 
learning process. But the reinforce- 
ment principle is concerned with tem- 
poral relationships between the CR and 
the UCS. This gradient cannot, by 
definition, have an effect on an S-R 
tendency until the strength of the tend- 
ency is such as to bring the CR above 
threshold. That is, the effects of re- 
inforcement are not felt until condi- 
tioning is already somewhat underway. 
How then, does the CS come to elicit 
the CR in the first place? The initial 
connection would appear to depend 
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on the operation of the contiguity prin- 
ciple alone. Unless the arrangement 
of stimuli is such that the CS and 
UCR fall within the effective range of 
this gradient the CR will never be es- 
tablished. As the strength of the new 
connection reaches threshold value re- 
inforcement begins to play a role. As 
the connection grows stronger and the 
response becomes more consistent the 
importance of reinforcement becomes 
correspondingly greater, until eventu- 
ally the effect of contiguity is negligible 
by comparison. It is more meaningful, 
therefore, to adopt the assumption that 
the gradients combine in an additive 
fashion weighting the contribution of 
each in accordance with its changing 
importance. 

The exact character of the predicted 
outcome depends on assumptions which 
are made concerning the shape, relative 
extent, and relative heights of the 
gradients. In the interests of simplic- 
ity and in the absence of direct evi- 
dence to the contrary it is assumed that 
both the contiguity and reinforcement 
gradients are linear. Though their 
actual extent is not specified it is en- 
visaged that it would be measurable 
in terms of seconds or even milliseconds. 
For the present it is assumed, again 


for simplicity and in the absence of 
direct evidence, that both gradients are 
symmetrical and that they are equal 


in extent. The change in the relative 
importance of the two mechanisms is 
represented by changing the relative 
heights of the gradients according to 
the stage of training, while their ex- 
tent remains constant throughout. 
Summarizing, the gradients are linear, 
symmetrical, and equal and limited in 
extent. The reinforcement gradient is 
at maximum height on the first trial 
(zero where there is no tendency for 
the CS to elicit the response to be 
conditioned before training begins) 
and rises as a function of the number 
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of reinforced trials. This rise is ac- 
companied by a fall in the height of 
the contiguity gradient indicating its 
relative decline in importance once the 
CR is established. The gradients com- 
bine in an additive fashion to affect the 
strength of the learned S—R connec- 
tion, which is reflected in performance, 
other things being equal. 

As the absolute extent of the gradi- 
ents is not at present specifiable the 
amount of overlap for a response with 
a given latency is not known. However, 
relative statements can be made about 
the overlap for responses of short, me- 
dium, and long latencies. The over- 
lapping gradients of contiguity and 
reinforcement are shown on the left 
of Figure 1 for responses of various 
latencies at various stages of training. 
The latency of the response is shown 
in the schematic representation by the 
amount of separation at the maxima 
of the gradients. Thus, Part A of 
Figure 1 represents the hypothetical 
case where latency equals zero. The 
upper dotted line represents the gradi- 
ent of contiguity early and the gradient 
of reinforcement late in training. Con- 
versely, the lower dotted line indicates 
the gradient of reinforcement early 
and that of contiguity late in training. 
The solid line represents two gradients 
when equal in height at an intermediate 
stage of training. Parts B, C, and D 
show the situation where the latencies 
are short, medium, and long respec- 
tively. If latency equals x then the 
distance between the maxima equals 
2x. Parts BI, BII, and BIII of Fig- 
ure 1 show the change in relative con- 
tributions of the gradients at three 
stages of training, early, middle, and 
late, respectively. Thus, in Part BI, 
‘arly in training, the height of the 
contiguity gradient exceeds that of the 
reinforcement gradient, in keeping 
with the argument that reinforcement 
is relatively ineffective at this stage. 
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In Part BII the two gradients are 
equal in height, while in Part BIII 
the gradient of reinforcement is higher 
than the gradient of contiguity. Parts 
CI, CII, and CIII and DI, DII, and 
DIII represent the same changes in 
importance of the gradients for re- 
sponses with longer latencies. 

The corresponding performance func- 
tions relating strength to 
CS-UCS interval are shown on the 
right of Figure 1, each of these func- 
tions being obtained by additive com- 
bination of the gradients on the left.* 


response 


3 The value zero on the abscissae of the 
performance functions does not denote the 
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Consideration of these performance 
functions leads to the following deduc- 
tions from the theory: 

1. With responses of latency greater 
than zero, the optimum interval for 
conditioning shifts with the amount of 
training (Parts BI, BII, and BIII). 
The direction of this shift is from a 
relatively short interstimulus interval 
early in training to a relatively long 
one later. The amount of shift is 
greater the longer the latency of the 
response being conditioned. (This 
shift in Parts BI-III may be compared 
with that in Parts CI-III and DI-III.) 

2. With responses of such a latency 
that the overlap of the gradients is 
slight (Part D) a second peak appears 
in the performance function if the CS 
tends to elicit the response to be con- 
ditioned from the outset, as is the case 
where a weak response tendency is being 
strengthened rather than a new con- 
nection being formed. The longer the 
response latency, the greater is the 
separation of the peaks. Two experi- 
ments which fulfill these necessary 
conditions reveal the double-peaked ef- 
fect. Wickens, Gehman, and Sullivan 
(1959) report 500 milliseconds best 
with a second peak at 1500 milliseconds 
and Jones (1961) obtained peaks at 
440 and 1045 milliseconds. The con- 
ditions for producing the second peak 
are fairly common, in particular with 
“emergency” responses such as the 
GSR which was the response used in 
simultaneous presentation of CS and UCS 
by the experimenter. The gradient of con- 
tiguity operates about a neural event conse- 
quent upon the occurrence of the CS rather 
than about the CS itself. In order to trans- 
form the functions shown here into forward 
and backward experimental arrangements of 
stimuli the zero point should be shifted to 
the left by an amount corresponding to the 
temporal lag between the CS and its neural 
counterpart. This distinction has not been 
made in labeling the figure as the predictions 
involve intervals of relative rather than of 
absolute lengths. 





182 


both the experiments cited. With 
many other responses, however, no 
such overt tendency is present before 
training and in some cases subjects 
who respond to the CS on a pretrain- 
ing test are discarded. It might be 
argued that a covert tendency to re- 
spond is always present and all that 
is ever observed is the strengthening 
of existing response tendencies, rather 
than the establishing of new ones. 
However, it is assumed here that for 
strengthening by reinforcement to 
occur the response tendency must first 
rise above threshold so that an overt 
response, though perhaps a weak one, 
occurs in the temporal neighborhood 
of a reinforcing state of affairs. The 
mechanism which acts to bring the 
latent tendency above threshold is con- 
tiguity of the CS and the required 
overt response, this contingency being 
arranged experimentally by the pres- 
entation of the UCS. Where the CS 
and UCR are too widely separated the 
response tendency cannot be brought 
to the necessary strength in the experi- 
mental session. Hence, only when the 
tendency has already reached the requi- 
site level, can the range of intervals 
spanned exclusively by the reinforce- 
ment gradient be effective in further 
strengthening the tendency in question. 
Hilgard and Marquis (1940) fail to 
indicate what mechanism acts to bring 
an initially subthreshold response 
tendency to sufficient strength for 
the operation of their reinforcement 
principle. 

3. If measures of response strength 
are summed over all stages of training, 
the longer the response latency the 
wider the range of intervals which are 
effective in producing conditioned re- 
sponses if the CS from the outset tends 
to elicit the response to be conditioned 
(compare the range of CS—UCS values 
embraced by the performance functions 
in Part B, C, and D). In addition, 
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the performance function is less 
sharply peaked with longer latency re- 
This prediction was tested 
by Jones (1961) in an experiment 
where two responses which were 
evoked by the same CS but differed 
in latency were conditioned in the same 
subject by the classical method. It 
was found that the function for the 
conditioned GSR was flatter and 
broader than that for finger movement. 

4. Where the response latency is so 
long that the gradients do not overlap, 
the range of intervals covered by the 
contiguity gradient will give increased 
response strength corresponding to the 
height of that gradient early in training 
but the strength of the tendency will 
not continue to increase and may de- 
cline due to the lack of reinforcement. 
No other intervals will yield any con- 
ditioned responses unless the CS elicits 
the response to some extent from the 
outset. This case is not shown in Fig- 
ure 1 but is an extension of Prediction 
2 above. 

The above predictions differ sharply 
from any to be derived from other 
approaches to the problem of the in- 
terval effect. Apart from the analysis 
by Hilgard and Marquis there has 
been no explicit attempt to relate the 
effect to response latency. It is possi- 
ble, however, to: deduce the appropriate 
interactions. Cognitive theory, which 
relates the efficacy of learning to the 
perceptual grouping of two stimulus 
events or the internal representations 
of such events, implies that characteris- 
tics of the responses evoked are irrele- 
vant. Therefore, although it is difficult 
for a cognitive theorist to explain the 
superiority of a particular interval, he 
certainly expects to find the same in- 
terval superior whatever the latency 
of the CR or UCR. Hull is concerned 
with the contiguity of the stimulus 
trace and an effector activity as is clear 
from his statement of the primary re- 


sponses. 
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inforcement postulate (Hull, 1952, p. 
5, Postulate III). Therefore, if the 
trace is at its maximum strength after 
450 milliseconds, then: (a) an inter- 
stimulus interval of 450 milliseconds 
is superior only when the UCR has a 
latency approaching zero; (b) the 
longer the latency of the UCR the 
shorter should be the optimum CS- 
UCS interval; (c) where the response 
latency is longer than 450 milliseconds 
backward presentation of CS and UCS 
is most favorable; (d) optima longer 
than 450 milliseconds cannot occur re- 
gardless of the characteristics of the 
response. Guthrie also implies that a 
response rather than a neural event 
must be contiguous with the cue 
(which is almost certain to be a medi- 
ating stimulus event). It follows from 


Guthrie’s analysis, as it does from 
Hull’s, that the longer the latency of 
the UCR the shorter the optimum 
interval if the UCR is to be contiguous 
with a consistently occurring “true” 


CS. None of these positions are sup- 
ported by the experimental evidence 
at present available. 

The deductions from the  two- 
principle theory may be formulated in 
terms of a comparison of trends in 
the learning curves for different inter- 
stimulus intervals. With response 
latency held constant, it is predicted 
that short-interval groups show a com- 
paratively rapid initial improvement 
and the acquisition curves then flatten 
out and may even decline (see the 
rise and fall of the left wing of the 
performance function in Figure 1, 
Parts CI-III). Longer-interval groups 
(right-wing of the performance func- 
tion) show a less rapid rise initially 
but eventually reach, and may even 
surpass, the shorter groups. The rela- 
tionship between the curves is shown 
in Figure 2. The other theories men- 
tioned above do not anticipate a 
change in pattern among the curves 
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Fic. 2. Relationships among performance 
curves in the acquisition of a CR with vari- 
ous CS-UCS intervals. 


during acquisition but can predict only 
that the pattern established early be- 
comes more distinct as training pro- 
gresses. In a direct test of the pre- 
diction, Jones (1961) found the ex- 
pected pattern among the acquisition 
curves for groups in which the GSR 
was conditioned at various intervals 
from 20 to 1245 milliseconds. 

Further evidence supporting the 
prediction is found in the learning 
curves obtained by Moeller (1954) for 
the GSR and McAllister (1953) for 
the eyeblink which show that longer- 
interval groups approach their perform- 
ance asymptote at an initially slower 
rate than shorter-interval groups. 
Jones (1959) observed that groups in 
which the GSR was conditioned with 
550 and 1060 milliseconds intervals 
respectively approached approximately 
the same asymptote but that the 550 
milliseconds group did so much more 
rapidly. It may be noted that White 
and Schlosberg (1952), who report a 
short optimum of 500 milliseconds for 
the GSR using measures taken on ex- 
tinction trials, trained their subjects 
for but five conditioning trials. 

It is often reported of simultaneous 
and backward presentation of the CS 
and UCS that the performance curves 
show a brief initial rise followed by 
a gradual decline (Spooner & Kellogg, 
1947). This is the type of effect that 
would be expected under these condi- 
tions from the analysis given here. 
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The explanation commonly offered in- 
vokes a new mechanism sometimes 
referred to as “sensitization” but no 
additional concept is required to incor- 
porate the effect here. Backward con- 
ditioning emerges as a special case of 
the interval effect rather than differing 
qualitatively from forward condition- 
ing. Hull (1943, p. 172) suggested a 
treatment of this kind in his discussion 
of Wolfle’s data but his later develop- 
ment of the stimulus trace concept 
made such interpretation difficult 
where the UCR as well as the UCS 
preceded the CS. Confirming the pos- 
sibility of real backward conditioning, 
Champion and Jones (1961) found 
that a group in which the GSR was 
conditioned with a backward interval 
of 750 milliseconds performed better 
than did a pseudoconditioning control 
group. 

Another prediction which 
from the present interpretation con- 
cerns the effect of a shift of inter- 
stimulus interval during the course of 
training. Suppose that for a group of 
subjects forward conditioning begins 
with a short interval in the range most 
favorable for the effect of contiguity 
until the CR begins to appear fairly 
consistently. If at this stage the inter- 
val is changed to a longer one, more 
favorable for the effect of reinforce- 
ment, performance will surpass that of 
control groups conditioned throughout 
at either the shorter or the longer in- 
terval. Further, long latency responses 
are able to tolerate larger shifts of this 
kind than short latency responses. 

If the CS-UCS interval is held con- 
stant and the learning curves for re- 
sponses of different latencies are com- 
pared they will differ in shape in a 
manner depending on the range of val- 
ues in which the interval is held. At 
an intermediate range of interval val- 
ues the curves for different responses 
have the same shape because contiguity 


follows 
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and reinforcement are favored equally 
regardless of response latency. How- 
ever, with intervals in the shorter 
range (e.g., as in backward condition- 
ing) separation of CR and UCS onset 
(reinforcement) is much greater than 
the separation of CS and UCR (con- 
tiguity) and the difference between 
the two is emphasized with a long 
latency response. As it has been pro- 
posed that contiguity is of greatest im- 
portance early in training but that re- 
inforcement gains in importance dur- 
ing the later stages, the longer latency 
response shows the more rapid initial 
rise, but later in training the shorter 
latency response reaches and maintains 
a higher level of performance. Con- 
versely, if the interval is lengthened 
beyond the intermediate range (as in 
forward conditioning) then contiguity 
is more adversely affected than is re- 
inforcement. This difference is again 


emphasized where the response has a 


long latency. Therefore, in this case 
the performance curve for the short 
latency response shows a more rapid 
initial rise but is later surpassed by 
that for the long latency response. 
The predicted relationship was _ ob- 
tained by Jones (1961) in condition- 
ing finger withdrawal and GSR with 
forward intervals. 

The two-principle theory of classical 
conditioning put forward in this paper 
leads to the set of predictions outlined 
above, some of which have already 
been confirmed. These predictions 
cannot be derived from other theories 
in their present form. No theorist to 
date has discussed the interval effect 
in relation to the course of learning 
and only Hilgard and Marquis have 
attempted to relate the optimum CS- 
UCS interval to the latency of the 
response being conditioned. When 
these. factors, i.e., amount of training 
and response latency, are considered 
in terms of the constructs offered by 
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others in explanation of the interval 
effect the predictions of the inter- 
actions concerned differ markedly from 
those of the present interpretation. 


SUMMARY 


The function relating the interval 
between the onset of the CS and the 
UCS to response strength in classical 


aversive conditioning is interpreted in 
terms of the joint action of two princi- 


ples. Early in training the efficacy 
of conditioning depends mainly on the 
contiguity of CS and UCR but later 
mainly on the proximity of the CR to 
reinforcement (UCS). In both cases 
a bidirectional gradient is assumed to 
operate such that the greater the dis- 
contiguity of the events the less effec- 
tive the learning. It is predicted that 
variations in experimental procedure, 
such as the latency of the response 
being conditioned, which affect either 
one or both of these relationships affect 
the function relating the CS—UCS in- 
terval to response strength and that 
the shape of the learning curves de- 
pends on the CS—UCS interval used 
in training. 
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This paper introduces a new deter- 
ministic model for analyzing decision 
making in gambling, the pattern and 
level of risk (PLR) model. At the 
beginning of the paper, the experi- 
mental evidence for four traditional 
models is examined and the inade- 
quacies of these models shown. Then 
the PLR model is described, predic- 
tions are drawn from the model, and 
evidence is presented to support the 
predictions. Finally some implications 
of the PLR model are shown. 


THE EVIDENCE FOR TRADITIONAL 
MopELs 


Edwards (1955) described four, 
related deterministic which 
have, in large part, guided experi- 
mentation on gambling decisions. All 
four assume that a gambler will choose 
the alternative (e.g., a bet) with the 
maximum value, and that the value of 
an alternative can be calculated by 
multiplying the value of each of its 
outcomes by the probability that the 
outcome will occur and summing these 


The 


models 


over all outcomes. 


products 
models differ with respect to whether 


objective or subjective values and 
probabilities are used in these calcula- 
tions. The four models are defined 
by the following equations for calculat- 


1 This paper was completed during tenure 
of a United States Public Health Service 
postdoctoral research fellowship as a Re- 
search Associate in the Program of Grad- 
uate Training and Research in International 
Relations at Northwestern University. 

The author is indebted to C. H. Coombs, 
Ward Edwards, Harold Guetzkow, and 
John D. Pruitt for their advice 


ing the value of an alternative: 


2 pid: 

> PP; 

z Pitt; 
SEU = = Piru; 


Equation 1 defines expected value 
(EV), where p; and §, stand for the 
objective probability and dollar value 
of the ith outcome and 3/i means that 
the products p,$; are summed over all 
of the potential outcomes of the alter- 
native. Equation 2 defines subjectively 
expected money (SEM), where P,* 
refers to the subjective probability of 
the ith outcome and the other symbols 
have the same meanings as above. 
Equation 3 defines expected utility 
(EU) where 4 refers to the utility 
(subjective value) of the ith outcome 
and the other symbols have the same 
meanings as above. Equation 4 de- 
fines subjectively expected utility 
(SEU); all of the symbols have the 
same meanings as in the other models. 

In evaluating these models, three 
criteria will be kept in mind: (a) the 
range of alternatives or choice situa- 
tions within which each model has 
predictive value; (b) the accuracy 
with which each model predicts de- 
cisions within its range; and (c) in 
where both have predictive 
value, whether a model containing a 
subjective parameter is any better than 


ranges 


the comparable one containing an ob- 
jective parameter. The last considera- 
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tion is important from the point of 
view of parsimony. If a model with 
an objective parameter is as good at 
predicting decisions within a certain 
range as the comparable model with a 
subjective parameter, then the former 
should be preferred because it requires 
no special measurement procedures. 


The EV Model 
The EV model (and the SEM and 


EU models as well) is, of course, in- 
applicable in situations where the ob- 
jective parameters which compose it 
are not defined. In addition, it is un- 
likely to be valuable where objective 
parameters, though defined, are not in 
some way communicated to the gam- 
bler or where individual tastes are 
obviously important in determining the 
value of outcomes (as in preference 
among foods). There remains a hard 
core of situations in which objective 
probabilities are fairly apparent and 


outcomes are money or objects easily 
Here the EV 


model is at least worthy of considera- 


convertible into money. 


tion. Even when these conditions are 
met, the EV model is known to be 
grossly inadequate in the range of bets 
which have very small probabilities of 
winning or losing large amounts of 
money (Allais, 1953; Bernoulli, 1954). 
The inaccuracy of the EV model in 
these cases is shown by the willingness 
of people who are well informed about 
the probabilities to buy lottery tickets 
and insurance (where EV is consider- 
ably less than zero) and to play the 
St. Petersburg Paradox only at very 
low stakes (where EV is infinite at 
any stake). 

In the of bets which 
moderate probabilities and values of 
outcomes, research findings seem to 
support the following generalization : 
The EV model is a poor predictor of 
EV differences 
small, but improves in predictive ca- 


range have 


decisions where are 
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pacity as EV differences increase. 
Thus, the model is useless for predict- 
ing choices between with the 
same EV ; the model predicts indiffer- 
ence between such bets whereas many 
experiments have revealed consistent 
preference (e.g., Coombs & Komorita, 
1958; Edwards, 1953). Mosteller 
and Nogee (1951) found that predic- 
tion improved with an increase in EV 
difference as follows: Where the dif- 
ference in EV between two two-out- 
come bets lay between 2¢ and 50¢, 
only 42% of the predictions were cor- 
rect (50% would be expected by 
chance) ; between 55¢ and $1.55, 57% 
were correct ; between $1.80 and $2.20, 
60% were correct; and between $2.50 
and an undisclosed maximum, 67% 
were correct. These percentages would 
undoubtedly have improved somewhat 
if the bets had been replicated and 
“dominant” preferences established ; 
but they are low enough to suggest 
that the EV model leaves much to be 
desired as a predictive device. 

One other generalization about the 
scope of the EV model is suggested by 
an experiment by Edwards (1954a) : 
The less complex the bets between 
which a decision is made, the more 
accurate will the EV model be in pre 
dicting these decisions. 


bets 


The SEM Model 

There is one kind of bet, very com 
mon in everyday gambling situations 
such as roulette or horseracing, in 
which the SEM model makes grossly 
incorrect predictions. This is the bet 
in which a person may stake as much 
money as he likes on a chance event 
and wins a fixed multiple of his stake 
if the chance event occurs. In suct 
cases, the SEM model (and the EV 
model) predicts that if a man _ will 
bet at all on a chance event, he 
will put all the money he has on 
that chance event, a prediction which 
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is incompatible with everyday observa- 
tions of gamblers. However, in the 
typical laboratory situation, where 
only two bets are available at a time, 
the model not lead to absurd 
predictions. 

Laboratory data appear to support 
two generalizations about the SEM 
model: (a) The SEM model is fairly 
accurate in predicting choice between 
simple bets which differ in probability 
of winning or losing but whose out- 
comes involve the same general level of 
money. (b) The SEM model is only 
slightly if any better than the EV 
model in its ability to predict choice 
between bets with the same probabil- 
ities of winning and losing but differ- 
ing in level of money. 

Evidence for the first generalization 
comes from the work of Edwards on 
probability preferences (1953, 1954a, 
1954b, 1954c). When asked to choose 
between bets with equal EV but dif- 
fering in probability of winning, the 
subjects showed a marked group pref- 
erence for certain probabilities (e.g., 
4/8) and a marked dislike for others 
(e.g., 6/8). 


does 


The same preference pat- 
terns were found when the amount of 
involved in the 
creased, making an interpretation in 
terms of the EU model very unlikely 
Similar results were found for indi- 
vidual cases by Coombs and Pruitt 
(1960), who pointed out in addition 
that people differ in the probabilities 
they prefer. The SEM model would 
appear to be the simplest 
explaining these preferences, and a 
test of this explanation was made by 
Edwards (1955). First he 
betting method which assumed the 
SEM model to measure for each sub- 
ject the subjective probability of win- 
ning and losing at eight levels of prob- 
ability. Then he inserted these values 
into the SEM model to predict prob- 
ability preferences. Eighty-three per- 


money bets was in- 


way of 


used a 
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cent of the choices between winning 
bets and 65% of the choices between 
losing bets were successfully predicted. 
Other evidence suggests that predic- 
tions would have been even better if 
the choices had been replicated enough 
times to establish the dominant prob- 
ability preference pattern for each sub- 
ject. It should be noted that the EV 
model is clearly inadequate to explain 
these data since all of the bets had 
the same EV level so that there was 
no basis for making predictions. 
Evidence for the second generaliza- 
tion comes from an experiment by 
Suppes and Walsh (1959) in which 
predictions by the SEM model to 
choices between two-outcome bets with 
a subjective probability of winning of 
4 were correct only 57% of the time. 


The EU Model 

The measurement of utility has 
turned out to be a very difficult and 
elusive task. At least seven methods 
have been used over the past 10 
years, three assuming the EU model 
and four the SEU model. In 
case the utility values thus measured 
have been put back into the model 
to predict choices between pairs of 
Thus, to a greater or 
extent, these studies constitute a test 
of the EU and SEU models. 

Two of the three EU experiments 
have not been adequate tests of the 
model. One, by Edwards (1955) 
used a method of measuring utility 
which he has since repudiated (1961) ; 
the results were inconclusive. The 
other by Coombs and Komorita (1958) 
yielded 29 out of 30 correct predic- 
tions, but all 30 would have been cor- 
rect on the assumption that the sub- 
jects preferred less risk to more, a 
characteristic they showed throughout 
their behavior. Furthermore only 
three subjects were used. 

The third study (Mosteller & 


each 


bets. lesser 
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Nogee, 1951) is a better test of the 
model but there are ambiguities in 
interpreting it. (Some of the results 
of this study have already been pre- 
sented in the section on the EV 
model.) Utilities of money were gen- 
erated and used to predict choices be- 
tween bets at a number of different 
levels of probability and money. These 
predictions were correct in 66% of 
the pairs, as opposed to 50% success 
with the EV model (expectation by 
also 50%). As in the 
case of predictions from the EV model, 
the accuracy of prediction increased 
with an increase in EU difference 
between the bets: in pairs where the 
difference was over $2.50, 93% of the 
One might 
conclude from this study that: The 
EU model is somewhat better than 
the EV model when the EU difference 
between bets is small and improves 
markedly as the EU difference in- 
However, the method used 
for measuring utility is ambiguous; 
the decisions made by the subjects in 
the utility measuring sessions could 
just as well have been determined by 
subjective distortions of probability as 
by departure of the utilities from 
money values. If this had been true, 
the SEM model would have made as 
good or better predictions than the 
EU model; this possibility was not 
examined, although data were avail- 
able to do so. Conclusions about the 
value of the EU’ model must await this 
further check. 


The SEU Model 


Aside from Edwards’ (1955) ex- 
periment mentioned in the previous 
section, which employed a question- 
able method for measuring utility and 
will therefore not be further discussed, 
all the experiments on the SEU model 
have used a chance event with both 
objective and subjective probability 


chance 


was 


predictions were correct. 


creases. 
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equal to $. This means two things: 
(a) that the model remains untested 
over a broad range of bets and (b) 
that the EU and SEU models cannot 
be compared on the basis of existing 
data. 

In the case of bets involving money, 
the SEU model does not appear to 
have been particularly successful. One 
study (Davidson, Suppes, & Siegel, 
1957, pp. 19-81) claims success, but 
the test of the SEU model was very 
weak permitting quite variable results 
to be considered positive evidence ; the 
results must be considered inconclu- 
The other experiment, done by 
Suppes and Walsh (1959), was a 
much stronger test of the model. The 
SEU model predicted correctly only 
58% of the choices between paired 
comparisons (chance expectation : 50% ) 
and was only slightly better than the 
SEM model which predicted correctly 
57% of the choices. 

The SEU model made a somewhat 
better showing in one study which 
used bets composed of outcomes other 
than money. Davidson, Suppes, and 
Siegel (1957, pp. 82-103) measured 
the utility of phonograph records with 
a linear programming method. De- 
pending on certain minor assumptions, 
predictions about choices between bets 
containing these phonograph records 
were correct 67% or 71% of the time. 
Another study (Hurst & Siegel, 1956) 
used cigarettes as the outcome, but the 
number of correct predictions was not 
reported. 


sive. 


Summary Evaluation of the Tradi- 
tional Models 


All four of the traditional models 
have shown some success in predicting 
choices between bets, but most of the 
successes have been of moderate mag- 
nitude (57% to 71% correct). There 
are two possible exceptions to this 
generalization, each involving restricted 
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ranges of bets: (a) When positive ex- 
pected value bets differ in probability 
of winning but have roughly the same 
level of money in their outcome, the 
SEM model is quite successful (83% 
correct). (b) When EU differences 
are very large, the EU model seems 
to do a very good job (93% correct) 
but there are some ambiguities in in- 
terpreting these findings. Aside from 
these limited findings, the four models 
have shown only moderate success. 
WueErRE Do WE Go Now? 

There are two directions which the 
study of gambling behavior might take 
from here: (a) further refinement of 
the traditional models, e.g., new at- 
tempts to measure utility and subjec- 
tive probability, more comprehensive 
studies of the range in which these 
models have predictive value, or (0) 
The first di- 
rection has much to be said for it; 
there many gaps in our knowl- 
edge; most of the generalizations stated 
above could stand re-evaluation with 
new data. But there are other consid- 
erations which strongly suggest the ap- 
propriateness of the second direction: 
In an infant science, such as the study 
of decision making, most approaches 
are likely to be wrong so that the 
need for innovation is likely to be 
greater than the need for model re- 
finement. Furthermore the results 
are not really en- 
couraging—a great deal of the vari- 
ance is still not explained—and there 
is no reason to believe that the tradi- 
tional models will be able to account 
for it. In view of these arguments, 
the author has opted for a search for 
new models. 

This does not mean that we should 
utterly reject the traditional models; 
at this point they are the best we 
have. Because they are the best we 
have, they can and must be used in 


search for new models. 


are 


discussed above 
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building models of other situations 
which resemble gambling (for exam- 
ples see Siegel, unpublished *; Snyder, 
1960; Tanner & Swets, 1954). But 
some effort must also be made to im- 
prove our understanding of gambling. 

The search for new directions, which 
has culminated in the PLR model of 
this paper, was begun by Coombs and 
the author (1960) in a study of the 
utility of risk. In this study, utility 
of risk was conceptualized in terms of 
“variance preference’: the greater a 
person's utility of risk the higher 
variance he will prefer. A_ partial 
model of variance preference was de- 
vised: that the order of preference 
among variances for any given prob- 
ability of winning will be a folded J 
variance. And this partial 
model was overwhelmingly supported 
by the data. This and other 
regularities of behavior found in that 
study plus some observations of real 
life gambling have led inductively to 
a more comprehensive partial model, 
the PLR model. 


scale of 


some 


THE PATTERN AND LEVEL OF 
Risk MopeEt 


The model which is developed in 
this paper applies only to bets which 
have a least one negative outcome, i.e., 
which have some element of risk. In 
the discussion section, some specula- 
tions will be presented about the pos- 
sibility of generalizing it to all bets. 


Concepts 

Every bet which has at least one 
negative outcome can be completely 
described by specifying its “pattern 
of risk” and “level of risk.” The pat- 
tern of risk (“pattern” for short) of a 
bet is determined by the number of 
possible outcomes, the probability of 


2 Siegel, S. Decision making and learning 
under varying conditions of reinforcement. 
Unpublished manuscript, 1960. 
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achieving each outcome and the ratio 
of the outcomes to one another. Two 
bets have the same pattern if they have 
the same number of outcomes at the 
same probabilities and if the outcomes 
of one can be obtained by multiplying 
the corresponding outcomes of the 
other by a constant. Thus two bets 
with the same pattern are: 

t chance to win 50¢ 


A: 


chance to lose 50¢ 


4 
4 


chance to win $1 


B: 4 chance to lose $1 


Two others are: 
1/2 chance to win 
/3 chance to lose 
6 chance to lose 
2 chance to win § 
D: 1/3 chance to lose 
6 chance to lose 
The level of risk of a bet is defined as 
negative outcomes 
weighted by their respective proba- 
bilities of occurrence. Thus the level 
of risk of Bet A is 25¢; of Bet B, 50¢; 
of Bet C, 35¢; and of Bet D, $1.05. 
The concepts of pattern of risk and 
level of risk may seem arbitrary at 
first: A bet can be described in many 
terms; why choose these? It is the 
contention of the author that people 
usually perceive bets in these terms. 
The evidence for this lies in the struc- 
ture of most popular gambling situa- 
tions, where characteristically there is 
a sharp distinction between the chance 
events and payoff ratios on the one 
hand (pattern of risk) and the size 
of stake on the other (level of risk). 
For instance, in roulette each of the 
squares on which stakes may be placed 
corresponds to a specific chance event 
and has a fixed ratio of payoff to 
stake; thus each of the squares cor- 
responds to a specific pattern of risk. 
The size of the gamblers stake, which 
determines the level of 


the sum of its 


risk, is com- 
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pletely distinct from the square on 
which he puts it. In betting on horse 
races, the gambler chooses a pattern 
by deciding which horse to bet on and 
a level of risk by deciding what size 
ticket to buy. In friendly wagers, 
separate announcements are often 
heard of the pattern of the wager (e.g., 
“T'll give you 5 to 3 on Navy’) and 
the amount bet (“O.K., I'll put $3 
on Army.”). Finally, in investment a 
sharp distinction is drawn between the 
company or bond invested in (pat- 
tern) and the number of shares or 
bonds bought (level of risk). 

In building the PLR model, it is 
assumed that people have separate at- 
titudes toward pattern and level of 
risk and that these interact to deter- 
mine gambling decisions. Attitude to- 
ward pattern is embodied in the con- 
cept “utility of pattern,” each pattern 
being assumed to have its place on a 
ratio scale of utility. In cases where 
utility of pattern is measured on an 
ordinal scale, we may speak in terms 
of pattern preferences (e.g., “Pattern 
X is preferred to Pattern Y.”). At- 
titude toward level of risk is embodied 
in the concept “utility of risk,” which 
is assumed to be measurable on a ratio 
scale. 


The PLR model 


At this stage in the development of 
the pattern and level of risk model 
(PLR model), no assumptions have 
been made about the nature of the 
relationship between utility of pattern 
and pattern itself other than the as- 
sumption that this relationship differs 
from person to person. Some specula- 
tions about this question will be pre- 
sented in the discussion section. The 
function relating utility of risk to level 
of risk is also assumed to differ from 
person to person but to have in com- 
mon for all people three characteristics : 
(a) being negative for all levels of risk, 





Risk 1N GAMBLING DECISIONS 


(b) being a negatively accelerated 
decay curve, and (c) being zero when 
the level of risk is zero. 

The PLR model is analagous to the 
traditional models in as much as it 
provides a formula for computing the 
utility of alternatives and assumes that 
the gambler will choose the alternative 
which has the greatest utility. This 
formula may be stated as follows: 

U(X) [5] 


J 


= rxy-u(px) + g(rx) 
where: 


U(X) is the utility of alternative X. 

rx is the level of risk involved in X. 

u(px) is the utility of the pattern 
px of X. 

g(rx) is the utility of risk. 


Causality moves in only one direction 
in this model: U(X) is the dependent 
variable and the terms on the right of 
the equation are independent variables. 
Again it should be emphasized that the 
PLR model applies only to bets which 
have at least one negative outcome. 
The model is illustrated in Figure 
1, where U(X) is shown as a function 
of ry. An intuitive account of the 
model follows. In order to 
the model intuitively, a name will be 
needed for the first term on the right 
hand side of Equation 5. This is 
called, heuristically “expectation of 
gain’; the greater this term, the more 
the gambler “expects to get out of a 
bet.” The term has the following 
properties, all intuitively compelling: 
As the utility of a pattern u(py) in- 
creases, expectation of gain from bets 
embodying that pattern also increases. 
In the case of patterns whose utility 
u(py) is positive, expectation of gain 
will increase as the amount of money 
risked ry increases, since for a given 
pattern, increasing the amount risked 
also increases the amount which may 
be won. But where the utility of a 
pattern is negative, expectation of gain 


discuss 





™ 
" 


G(r) 





Fic. 1. Illustration of the PLR model 
showing the utility of an alternative U(X) 
as a function of the level of risk involved 
in that alternative rx. Also shown are the 
components of U(X), the expected gain 
rx-u(px) and the utility of risk g(rr). 
(The ideal level of risk, which comes at 
the point where U(X) is maximal, and the 
maximum acceptable level of risk, MALR, 
are indicated. ) 


decreases as the amount of money 
risked increases. 

Expectation of gain and utility of 
risk are conceived intuitively as hy- 
pothetical constructs, having genuine 
referents in the form of two conflicting 
or cooperating behavior tendencies 
within the individual. These tend- 
encies cooperate when the utility of a 
pattern is negative. When the utility 
of pattern is positive, however, these 
tendencies come in conflict and the 
increases as the level of risk 
increases. Because the utility of risk 
is a negatively accelerated decay curve 
over level of risk, the model predicts 
what we know from experience (and 
what the SEM model failed to pre- 
dict), that there is a limit to the 
amount of money a man will risk on a 
pattern which has at least one negative 
outcome, no matter how favorable this 
pattern may appear. 


conflict 


Derivations from the Model 
In this section will be presented five 
propositions drawn from the PLR 
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All of them refer to the be- 
of individuals rather than 


model. 
havior 
groups. 

Proposition 1. At any level of risk, 
the order of preference among patterns 
of risk will be the same as at any other 
level of risk. 

Proof. Take any two bets X and Y 
at level of risk R, where X has pattern 
px and Y has pattern py and assume 
that X is preferred to Y, U(X) 
> U(Y). 
that: 


1.€., 


By Formula 5, this implies 


R-u(px) + g(R) > R-u(py) + g(X) 
which simplifies to: 
u(px) > u(py) 


Now take any other level of risk S and 
find the two bets X’ and Y’ which 
have the same patterns, px and py. 
Since we know that: 


u(px) > u(py) 
it follows that: 
S:u(px) + g(S) > S-u(py) + g(S), 


which implies by Formula 5. that 
U(X’) > U(Y’), ie., that X’ is pre- 
ferred to Y’. Thus it is shown that 
the order of preference between any 
two patterns of risk at any level of risk 
will be the same as the order of prefer- 
ence between the same patterns at any 
other level, from which Proposition 1 
follows. 


Note. In the next four propositions, two 
new terms are introduced: the “ideal level 
of risk,” which is the level of risk which 
the individual most prefers for a given 
pattern of risk, and the “maximum accept- 
able level of risk,” which is the highest level 
of risk he will voluntarily accept for a 
given pattern. 

Proposition 2. The more preferred 
a pattern of risk, the higher will be the 
ideal level of risk for that pattern. 

Proof. This proposition will be 
demonstrated with reference to Figure 
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1. The ideal level of risk is found, of 
course, at the point where the curve 
U(X) reaches a maximum. It should 
be immediately apparent that a person 
can have only one ideal level of risk 
since the sum of a straight line and 
a curve which is negatively accelerated 
is itself a negatively accelerated curve 
and, therefore, can have at most one 
maximum. By calculus, this maxi- 
mum occurs at the level of risk where: 


U'(X) = u(px) + g’ (rx) = 0 
i.e., Where: 
u(px) = — g’(rx) 

To put this in other words, the ideal 
level of risk for a pattern is found 
where the utility of that pattern equals 
the negative slope of the utility of risk 
curve. 

Now assume two patterns of risk 
P and Q such that u(P) > u(Q), ie., 
that P is more preferred than Q. Ac 
cording to our calculus, the negative 
slope of the utility of risk curve will 
be greater at the ideal for P than at 
the ideal for Q. Since the negative 
slope of the risk curve increases with 
increasing level of risk, this 
that the ideal for P will be found at 
a higher level of risk than the ideal 
for Q. 

Proposition 3. The more preferred 
a pattern of risk, the higher will be 
the maximum acceptable level of risk 
for the pattern. 

Proof. This proposition will also be 
demonstrated with reference to Figure 
1. It is well known that a curve which 
is negatively accelerated throughout 
can be crossed by a straight line at 
most two times. In this case, the func- 
tion U(X) of ry is the negatively ac- 
celerated curve and the level of risk 
axis (the abscissa) is the straight line. 
3y definition these curves cross one 
another at the origin. The other cross- 
ing, if it exists, is the maximum ac- 


means 
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ceptable level of risk, i.e., the point 
where U(X) =0. By algebra, the 
maximum acceptable level of risk for 
a pattern px occurs at the point where: 


u(px) = — g(rx)/rx 


i.e., where the utility of this pattern 
equals the negative utility of risk di- 
vided by the level of risk. 

Now assume two patterns of risk P 
and Q such that u(P) >u(Q), ie., 
that P is preferred to Q. According 
to our formula, this means that the 
ratio of the negative utility of risk di- 
vided by the level of risk will be 
greater at the maximum for P than at 
the maximum for Q. Since the func- 
tion relating the utility of risk to level 
of risk is a negatively accelerated 
growth curve, this ratio will be greater 
the greater is the level of risk. There- 
fore the maximum level of risk for P 
will be higher than the maximum 
level of risk for Q. 

Proposition 4. For any given pat- 
tern of risk, divide the levels of risk 
into two groups: those smaller than 
the individual’s ideal and those greater 
than his ideal. Within each group, 
levels of risk will be more preferred 
the closer they are to the ideal level 
of risk. 

Proof. As was mentioned above 
under Proposition 2, the ideal level of 
risk occurs at the point where U(X) 
reaches a maximum. Since U(X) is 
negatively accelerated throughout, it 
can have at most one maximum and no 
minima. The proof of Proposition 4 
comes from the nature of such a 
maximum ... namely, that the curve 
rises ever higher as it approaches the 
maximum and declines ever more 
sharply as it proceeds beyond the max- 
imum. This implies a theorem that 
on either side of the ideal the utility 
of bets with a given paitern increases 
as the level of risk approaches the ideal 
level of risk. Proposition 4 is a re- 
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statement of this theorem in preference 
language. 

Proposition 5. For any given pat- 
tern of risk, all levels of risk up to 
the maximum acceptable level of risk 
will be acceptable. 

Proof. This proposition follows 
from the assumption, stated in the 
discussion of Proposition 3, that the 
function of U(X) of rx can cross the 
level of risk axis at most two times, 
the first of which is the origin and 
the second, if it exists, the maximum 
acceptable level of risk. Since it can- 
not cross the level of risk axis between 
these points, it remains positive be- 
tween them . . . all levels of risk up to 
the maximum level have a positive 
U(X), which means that they are all 
acceptable. 


Experimental Evidence for the Propo- 
sitions 


The experimental evidence presented 
in this section is derived from two 
experiments on gambling. 

The method and part of the results 
of the first experiment are reported 
more extensively elsewhere (Coombs 
& Pruitt, 1960). The procedure con- 
sisted of presenting a large number of 
forced choices between two-outcome 
bets, such as the following: 


/3 to win $1.40 

/3 to lose 70 cents 
/2 

/2 


= 
2 


to win $1.00 


> ?. l 
Bet B: 172 to lose $1.00 
All bets had an expected value (EV) 
of zero. The bets were organized 
into five sets. Those in Sets I, II, and 
III had a constant probability of win- 


ning and differed in variance.’ The 


8 The variance of bets of this type is 
computed by the formula o* = pq(a—b)’, 
where ~ and q are the probabilities of win- 
ning and losing respectively, and a and b 
are the amounts to be won and lost. 
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probability of winning for Set I was 
1/3; for Set II, 1/2; and for Set III, 
2/3. There were six levels of vari- 
ance in each set: .12, .30, 1.0, 5.0, 25.0, 
and 100.0. The bets of Sets IV and V 
had a constant variance and differed 
in probability of winning. Set IV had 
a variance of 1.0 and Set V a variance 
of 25.0. The same levels of probability 
were used in both sets: 1/6, 1/3, 1/2, 
2/3, and 5/6. All of the bets within 
each set were paired with one another 
and each of the resulting pairs was 
presented eight times. Ninety-nine 
undergraduates were used as subjects. 

The second experiment was similar 
to the first in most respects but had 
six sets. Again the bets in Sets A, 
B, and C had a constant probability of 
winning and differed in variance. Set 
A had a probability of 1/3, Set B of 
1/2, and Set C of 2/3. Seven levels 


of variance were employed in each set: 
12, 2.0, 5.1, 12.5, 21.2, 32.0, and 45.2. 


The bets in Sets D, E, and F had a 
constant variance and differed in the 
probability of winning. Set D had a 
variance of .12, Set E of 12.5, and 
Set F of 45.2. In each set there were 
three levels of probability: 1/3, 1/2, 
and 2/3. As in the case of the first 
experiment, the bets were exhaustively 
paired within Sets D, E, and F, and 
each pair was presented eight times. 
However, in the case of Sets A, B, 
and C, an accept-reject method was 
used instead of paired comparisons; 
each bet was presented eight times 
and the subject was asked in each case 
to decide whether or not to accept the 
bet. Thirty-nine undergraduates were 
used as subjects. 

3efore discussing the results, three 
things must be said about the relation- 
ship between these experimental oper- 
ations and the terms of the PLR 
model. (a) It can be easily shown 
that for two-outcome bets with EV 
of zero, the pattern of risk is com- 
pletely determined by the probability 
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of winning. Thus, the five probabili- 
ties of winning used in these experi- 
ments correspond to five patterns of 
risk: 1/6, 1/3, 1/2, 2/3, and 5/6. 
(b) Within the range of probabilities 
used in these experiments, two bets 
with different probabilities of winning 
(different patterns, that is) and the 
same variance will have roughly the 
same level of risk. And level of risk 
is monotonicly related to variance. 
Thus, the six levels of variance in the 
first experiment correspond roughly 
to six levels of risk, and the seven 
levels of variance in the second ex- 
periment to seven levels of risk. (c) 
the design of these experiments per- 
mits us to make three kinds of anal- 
ysis: In the case of stimulus Sets I, 
II, III, A, B, and C, we can study 
preferences among levels of risk while 
holding the pattern of risk constant. 
In the case of Sets IV, V, D, E, and 
F, we can study pattern preferences 
while holding level of risk constant. 
Finally, for each subject, we can com- 
pare his preferences among levels of 
risk with his pattern preferences to 
discover interactions. 

Proposition 1. At any level of risk, 
the order of preference among pat- 
terns of risk will be the same as at any 
other level of risk. 

The responses to Sets IV and V in 
the first experiment provide us an 
opportunity to compare each subject's 
order of preference among the five 
patterns (1/6 to 5/6) at a low level 
of risk (variance of 1.0) with his order 
of preference among the same five pat- 
terns at a high level of risk (variance 
of 25.0). Proposition 1 implies that 
these two preference orderings will be 
identical. 

Preference orderings were obtained 
from the data by assuming that subject 
preferred Bet A to Bet B if he chose 
A on more than half of the occasions 
when it was paired with B. Inspec- 
tion of the results revealed striking 
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similarity between the orders of pref- 
erence at the two levels of risk, as 
predicted; but it is necessary to ex- 
press this similarity in a quantitative 
fashion. An index was therefore de- 
vised for expressing these preference 
orderings, which ranged from a low of 
1, for a pattern of consistent preference 
in all pairs for the smaller probability 
of winning, to a high of 11, for a 
pattern of consistent preference for 
the larger probability of winning.‘ 
Two such ratings were computed for 
each subject, one to describe his pat- 
tern preferences in Set IV, the other 
to describe his preferences in Set V. 
The correlation, over subjects, be- 
tween these two ratings was .93, a 
highly significant result, indicating con- 
siderable stability in the order of pref- 
erence among patterns as one moves 
from a low to a high level of risk. 

Similar results were found in the 
second experiment for Sets D, E, 
and F, 

Proposition 2. 


The more preferred 
a pattern of risk, the higher will be 
the ideal level of risk for that pattern. 

Since all of the levels of risk within 
a set were exhaustively paired, it was 
possible to identify for each of the 
Sets, I, IT, and III of the first experi- 


ment that level of risk which was 
preferred over all the others, i.e., the 
ideal level of risk. This meant than 
an ideal level of risk was identified 
for three patterns: 1/3 (Set I), 1/2 
(Set II), and 2/3 (Set IIIT). An 
index was devised for expressing these 
ideals, which ranged from 1, indicating 
a preference for the lowest level of 
risk, to 16, indicating a preference for 
the highest available risk. (Since there 
exist levels of variance below .12 and 
above 100.0, this scale has an arbitrary 
ceiling and floor.) Preferences be- 
tween the three patterns were ascer- 

4This and subsequent indices are thor- 


oughly described in the previous report 
(Coombs & Pruitt, 1960). 
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tained from the data of Sets IV and V 
combined. 

Proposition 2 was tested in the fol- 
lowing form: If one pattern is pre- 
ferred to another, there will be a 
higher ideal level of risk for the former 
than for the latter. Data supporting 
this hypothesis are shown in Table 1. 

In Table la the subjects are divided 
into two groups (columns) on the 
basis of which pattern they preferred, 
1/3 or 1/2. (Two subjects are omitted 
who did not have a preference between 
the two patterns.) Each of these 
two groups is divided into three sub- 
groups (rows) depending on whether 
the ideal level of risk for Pattern 1/3 
was greater than, equal to, or less than 
the ideal for Pattern 1/2. The results 
show a strong tendency for the more 
preferred pattern to have a_ higher 
ideal level of risk, as hypothesized. 
x? = 23.13 for the contingency table, 
with p< .0l. The relatively large 
number of cases in which both ideals 
were equal might be thought to weaken 
this hypothesis. However, for most 
of them (25 out of 30), the most 
preferred level of risk for both pat- 
terns was at one or the other of the 
extreme ends of the range of levels of 
risk used in this experiment and there- 
fore did not necessarily reflect the 
true ideal. Had the range of values 
been greater, most of these cases 
would undoubtedly not have exhibited 
equal ideals. The degree to which 
these data support the hypothesis can 
perhaps best be summarized in terms 
of the proportion of cases correctly 
predicted: In 82% of the cases where 
there was a difference in ideals, the 
direction of this difference could be 
predicted by knowing which pattern 
was preferred. 

Table 1b shows the same sort of 
results for the comparison between 
Pattern 1/3 and Pattern 2/3. Again 
the hypothesis is strongly supported 
(x? = 35.77, p<.01), with correct 
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TABLE 1 


CONTINGENCY TABLES SHOWING RELATION 
BETWEEN PATTERN PREFERENCE AND 
IDEAL LEVEL OF RISK 


Pattern preferred 


; Total 


Ideal for } greater than| 15 
for } 

Ideal for } equal to that 13 
for 4 

Ideal for 4 less 
for 4 


than 8 


Total | 36 


Pattern preferred 


; Total 


29 


Ideal for 4 greater than | 


than 3 
Ideal for } equal tothat| 12 
for 2 
Ideal for } less 
for 3 


than 8 


Total 49 
b. 4 vs. 3 


Pattern preferred 


a | Total 
Ideal for 4 greater than| 34 
for 3 
Ideal for § equal to that 12 
for j 
Ideal for 3 
for 3 


| 
less than 12 


Total 


Note. 


Entries are number of subjects. 


prediction in 84% of the cases where 
there was a difference in ideals. Six- 
teen of the 18 equal cases were at an 
extreme end of the range of levels of 
risk. Table Ic, for the Patterns 1/2 
and 2/3, also supports the hypothesis 
(xy? = 19.65, p< .01), with correct 


prediction in 74% of the cases where 
there was a difference in ideals. All 
30 equal cases were at an extreme. 

Proposition 3. The more preferred 
a pattern of risk, the higher will be 
the maximum acceptable level of risk 
for that pattern. 

Data concerning the maximum ac- 
ceptable level of risk was available 
from the responses to Sets A, B, and 
C of the second experiment, where the 
subjects were asked to accept or reject 
any or all of seven levels of risk for 
three patterns, 1/3, 1/2, and 2/3. A 
level of risk was assumed to be accep- 
able to the subject if he accepted it 
at least five out of eight times it was 
presented to him. An index was de- 
rived for each pattern which ranged 
from 0 (no level of risk acceptable) 
to 7 (highest risk offered was the 
highest acceptable level). Preferences 
between the three patterns were ascer- 
tained in the usual way from the data 
of sets D, E, and F combined. 

Proposition 3 was tested in the fol- 
lowing form: If one pattern of risk is 
preferred to another, the maximum 
level of risk acceptable for the former 
will be greater than that acceptable 
for the latter. Data supporting this 
hypothesis are shown in Table 2. 

In all three subtables there is a 
strong association between preference 
for a pattern and willingness to ~ccept 
a higher maximum level of ri:& at 
that pattern, as hypothesized. (for 
Table 2a, y?= 14.12, p< 01; for 
Table 2b, yy? = 17.36, p< .01; for 
Table 2c, x? = 16.97, p < .01.) Again 
the relatively large number of cases in 
which the maximum acceptable levels 
of risk for both patterns are equal 
do not necessarily weaken the hypo- 
thesis since most of them occur at 
the extremes of the scale (12 out of 
17 in Table 2a; 7 out of 10 in Table 
2b; and 13 out of 14 in Table 2c). 

Proposition 4. For any given pat- 
tern of risk, divide the levels of risk 
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into two groups: those smaller than 
the individual's ideal and those greater 
than his ideal. Within each group, 
levels of risk will be more preferred 
the closer they are to the ideal level 
of risk. 

Consider the data from Sets I, II, 


TABLE 2 


CONTINGENCY TABLES SHOWING RELATION 
BETWEEN PATTERN PREFERENCE AND 
MAX!MUM ACCEPTABLE LEVEL OF 
Risk (MALR) 


Pattern preferred 


Total 


MALR for 34 
than for 4 

MALR for } equal to| 
that for } 

MALR for } less than 
for } 


Total 


greater 


Pattern preferred 
MALR 


Total 


MALR for 43 greater 2 | st 
than for 7 

MALR for 4} equal to 7 d 10 
that for 4 

MALR for } less than 14 


for 3 


lotal 


Pattern preferred 


Total 


MALR for 
than for 3 
MALR for 4 equal to 
that for ? 
MALR for 3 


for 3 


} greater 


less than 


Total 


Note 


Entries are number of subjects. 
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and III of the first experiment. Prop- 
osition 4 implies that within each set, 
levels of risk above a subject’s ideal 
will be preferred in order of ascending 
magnitude whereas levels of risk be- 
low his ideal will be preferred in order 
of descending magnitude, i.e., that the 
preference orderings will be folded J 
scales, in the terminology of the first 
report (Coombs & Pruitt, 1960). As 
shown in the first report, the results 
strongly supported this hypothesis: In 
Set I, out of 92 nonrandom orderings 
of preference among levels of risk, 81 
were of the postulated type. In Set II, 
89 out of 96 were of this type. In 
Set III, 83 out of 93 were of this type. 

Proposition 5. For any given pat- 
tern of risk, all levels of risk up to 
the maximum acceptable level of risk 
will be acceptable. 

Proposition 5 can easily be tested 
by an examination of the data from 
Sets A, B, and C of the second experi- 
ment. Within each set, the maximum 
acceptable level of risk is the highest 
level of risk which was accepted at 
least five out of eight times it was 
presented to the subject. The hypo- 
thesis is that within each set all levels 
of risk smaller than this maximum will 
also be accepted at least five out of the 
eight times they were presented. The 
hypothesis was confirmed in 110 of 
the 117 possible sets of data (3 sets 
times 39 subjects) ; which is, of course, 
exceedingly strong evidence for the 
proposition, 

DiscussION 
Comparison with Other Models 


The four traditional models. The 
question naturally arises: Can the ex- 
perimental results which support the 
PLR model be predicted or explained 
by one or more of the four traditional 
models? If so, one might argue that 
there is no need for a new model. 

Two of these models yield clearcut 
predictions which are so grossly wrong 
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that the models can easily be rejected 
in the context of these data. The EV 
model is obviously inadequate because 
it predicts indifference between the 
bets of these experiments (which all 
have an EV of zero) whereas in 
reality consistent preference was the 
rule. The SEM model is also obvi- 
ously inadequate because, as mentioned 
earlier, it predicts that if a man likes 
a pattern at all he will insist on choos- 
ing the highest level of risk he can 
obtain for this pattern. Many sub- 
jects in the first experiment preferred 
moderate levels of risk, so that this 
prediction was not borne out. 

The other two models are less easy 
to disprove because their predictions 
are not clearcut unless utility of money 
is carefully measured. Though such 
measurement was not attempted in 
these experiments, it seems reasonable 
to reject the models on another basis, 
namely that there is nothing in either 


the EU or SEU models, as usually 
stated, which would lead one to pre- 
dict any of the regularities of behavior 
described in the five propositions and 
supported by the findings of these ex- 


periments. Perhaps, in the manner 
of the Procrustrean Bed, one of the 
traditional models could be stretched, 
new parameters added or more com- 
plex utility curves drawn to encompass 
these results. But the beauty of theory 
is its simplicity; so that it seems more 
reasonable to make striking new regu- 
larities of behavior such as these the 
basis of a new model. 

Again it must be stressed that the 
traditional models should not be com- 
pletely discarded; if possible, some 
common ground should be found be- 
tween the new model and what is good 
about the old ones. In the first section 
it was shown that the model which 
best fit the segment of data to which 
it was applied was the SEM model 
(Edwards, 1955). A way in which 
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this model can be incorporated into 
the PLR model without increasing the 
complexity of the latter will be shown 
in the section on research implications. 

Probability and variance prefer- 
The PLR does not cast aside 
the concepts of probability and vari- 
ance preference (Coombs & Pruitt, 
1960; Edwards, 1953) but rather pro- 
vides a larger framework within which 
they can be understood and related to 
one another. More specifically, the 
notion of “probability preference” is 
generalized in the concept “pattern 
preference,” which embraces attitudes 
toward both probability and ratios of 
outcomes. The earlier finding (Ed- 
wards, 1954b) that probability pref- 
erences are unaffected by changes in 
money level is generalized into Propo- 
sition 1. The notion of “variance 
preference” is translated in the con- 
cept “ideal level of risk”; and the 
hypothesis that variance preferences 
fit a folded J scale of variance, proposed 
by Coombs and the author (1960), 1s 
expressed in Proposition 4, which is 
mainly dependent on the negatively ac- 
celerated character of the utility of 
risk curve. In addition, the PLR 
model goes a step beyond these earlier 
formulations by showing, in Proposi- 
tion 2, how probability and variance 
preferences are related to one another 
and permitting predictions concerning 
the maximum acceptable level of risk 
in Propositions 3 and 5. 


ences. 


Research Implications 

One of the important features of 
the PLR model is the implication of 
Proposition 1 that pattern preferences 
can be studied independently of level 
of risk. If true, this greatly simplifies 
research on gambling by permitting 
study of one dimension at a time; 
but this assertion must itself be further 
examined in the context of a wider 
range of patterns. 
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In studying pattern preferences, one 
might ask the question, What are the 
dimensions of pattern which determine 
preferences among them? Two di- 
mensions which are suggested by the 
structure of simple wagers (e.g., “I'll 
give you 5 to 3 on the Cubs”) are the 
subjective probabilities of outcomes 
and the ratios of objective outcomes to 
one another. For example, the utility 
of pattern in the PLR model could be 
expressed by a subjectively expected 
ratio (SER) model, which would be 
quite similar to the SEM model and 
would, therefore, hopefully retain its 
best features (prediction of choices 
between bets differing in pattern when 
level of risk is held constant). The 
formula for computing the utility of 
pattern would be: 


u(px) = SER = [6] 


> P5*t; 
‘ 


where: 


t; equals |$,;|/Z |$;), 

i 
the ratio of the (absolute) amount of 
money in the ith outcome to the sum 
of the (absolute) amounts of money 
in all of the outcomes of the bet. 

One serious deficiency of the PLR 
model is its failure to predict to bets 
with no negative outcomes, where rx 
equals zero. There is a very simple 
modification which would make it 
universally applicable (and _ satisfy 
the ‘‘sure thing” principle for bets 
with no negative outcomes), namely 
to add another objective parameter 


mx = > {$;| and rewrite the basic 


s 
formula: 


U(X) = mx-u(px) + g(rx) [7] 
Since my is a monotonic, positive func- 
tion of ry for any pattern involving 
risk, Propositions 2 through 5 are 
unaltered by this procedure. Proposi- 
tion 1 would have to be revised but 
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not seriously. Since it has broader 
application, Formula 7 might have 
been used throughout this paper in- 
stead of Formula 5. However the 
latter is simpler and easier to present, 
and therefore was chosen. 
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DISCRIMINATIVE CLASSICAL 
PARALYZED BY CURARE CAN LATER CONTROL 
DISCRIMINATIVE AVOIDANCE RESPONSES 
IN THE NORMAL STATE’ 


RICHARD L. SOLOMON 


University of Pennsylvania 


The use of curare drugs to immo- 
bilize subjects in psychological ex- 
periments has special advantages for 
research on learning, transfer of train- 
ing, and thinking. Subjects cannot 
give overt, skeletal and 
therefore they cannot modify the ex- 
teroceptive or proprioceptive stimulus 
patterns that the experimenter wishes 
to control. 

We shall show that a discriminative, 
classical conditioning procedure carried 
out on curarized dogs can later control 
instrumental avoidance responses in the 
normal state. We shall describe an ex- 


responses, 


periment in which dogs were first 
trained to avoid shock by pressing a 
panel whenever a signal light (S°) went 
out. The panel-press restored the light. 
Then, when the avoidance response was 
stable, the dogs were totally curarized. 
Under curare the dogs were presented 


on some trials with a tone (S*) fol- 
lowed 10 seconds later by a shock of 5 
seconds duration. A contrasting tone 
(S-) was presented on other trials, 
never followed by shock. A few days 
later, after complete recovery from cu- 
rare, the dogs were tested for avoidance 


1 This research was facilitated by the Lab- 
oratory of Social Relations, Harvard Uni- 
versity. Pilot studies over the past seven 
years were supported by grants from the 
Medical Sciences Division of the Rockefeller 
Foundation and a Ford Foundation Grant-in- 
Aid. The experiment in this paper was car- 
ried out with the support of a National 
Science Foundation Grant (NSF-G-14438). 
We are grateful to P. D. Watson, A. H 
Black, N. J. Carlson, and M. R .Westcott for 
their assistance during many pilot studies. 


AND 
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CONDITIONING IN DOGS 


LUCILLE H. TURNER 
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responding to S*, S-, and S°. All dogs 
then showed the differentiating effects 
of the classical, discriminative condi- 
tioning experience. The avoidance re- 
sponse to S* was usually of the same 
order of magnitude as that to S°, while 
avoidance responding to S~ was weak, 
absent, or quick to extinguish. We do 
not believe that precise discriminative 
transfer of training—from the classical 
conditioning of reflexes to the in- 
strumental learning of responses—has 
heretofore been demonstrated in this 
manner.’ 

It is of considerable theoretical im- 
portance, as well as of empirical inter- 
to know that skeletal avoidance 
responses acquired in the normal state 
can be controlled by classical condition- 
ing procedures carried out under cu- 
rare. Some current versions of S-R 
reinforcement theory emphasize the 
importance of differential peripheral, 
skeletal responses and their correlated 
proprioceptive cues in the development 
of discriminative instrumental 


est, 


acts. 

2 The general objective of our experiment 
certainly is not original, see Estes (1943, 
1948). In these experiments rats were sub- 
jected to presentations of a tone followed by 
food, and then they were later trained to 
press a lever for food reinforcement. Dur- 
ing a test period the rate of lever pressing 
was increased by presentation of the tone al- 
though the lever pressing response had not 
previously been associated with the tone. 
The major difference between Estes’ experi- 
ments and ours lies in our use of the curare 
paralysis during discriminative conditioning 
so as to eliminate the possibility of response 
generalization from the conditioning situation 
to the instrumental learning situation. 
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Discriminative learning is hypothesized 
to take place by the action of differen- 
tial reinforcement on two different re- 
sponses to two different stimuli. 
Because we shall show that discrimi- 
native instrumental acts can be estab- 
lished without those acts having been 
reinforced differentially (because they 
could not occur under curare), then a 
revision of theories emphasizing the 
importance of peripheral, skeletal re- 
sponding and proprioceptive cues seems 
definitely to be required. Such a re- 
vision was once the goal of “latent 
learning” studies. But they failed to 
give clearcut evidence because the S-R 
reinforcement theorist who emphasized 
peripheral response components could 
always claim: (a) that overt responses 
were occurring during the latent learn- 
ing period, (b) that some kind of rein- 
forcement was apt to occur following 
those responses, and (c) that those re- 
sponses were similar in some way to 
the final required performances. These 
were cogent objections. They can be 
effectively circumvented, however, by 


the use of the curare-conditioning prep- 


aration. Stimulus sequences presented 
to the totally curarized subject cannot 
be accompanied by differential skeletal 
responding. In addition, skeletal re- 
sponding cannot occur in the presence 
of differential reinforcement. There 
are no overt acts for the experimenter 
to reward or punish. Thus the carry- 
over or transfer of “information’”’ ac- 
quired under curare so as to influence 
skeletal responses in the normal state 
could be mediated peripherally only by 
differential ANS responding or “anxi- 
ety.” Whether or not such a transfer 
phenomenon would be considered to be 
a function of “pure” central mediation 
or latent learning, or whether the me- 
diation would be considered to be a 
function of anxiety, would, of course, 
depend on one’s theoretical biases. A 
position emphasizing the mediation of 
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learning and transfer by proprioceptive 
or kinesthetic cues suffers most in the 
light of our findings. 


History OF PROBLEM 


The theoretical issue to which our experi- 
ment is addressed is certainly an old one. In 
the days when psychology was dealing with 
mental content, there were those who main- 
tained that the response processes of the 
skeletal musculature were necessary for per- 
ception and thought. In 1887, F. M. Mueller 
declared that we cannot think without words. 
This position was espoused by Watson and 
by Max when they identified implicit lin- 
guistic responses with thought 
Early behaviorism was peripheralistic in na- 
ture. But so were the systematic approaches 
of many students of consciousness. William 
James emphasized peripheral response events 
in the production of emotional feelings. E. 
Jacobson maintained that awareness itself 
was a by-product of motor reactions. For 
example, he argued that muscular relaxation 
was accompanied by the disappearance of 
imagery and thinking. 

While such peripheralistic interpretations 
of mental events were useful to the early 
behaviorists, the introspective methods used 
to arrive at the so-called “motor theory of 
consciousness” were shunned by them. At 
the hands of behaviorists, precise laboratory 
experimentation, aimed at the study of overt 
behavior as a function of stimulation con- 
ditions, replaced introspection. But the 
major idea of the “motor theory of con- 
sciousness—that the important mental proc- 
esses such as perception, feeling, thinking, 
and imagining were caused by peripheral 
motor reactions—was retained by the periph- 
eralistic behaviorists in new form. Complex 
behavioral phenomena, difficult to place in the 
simple S-R formula, were interpreted in a 
peripheralistic way. Thus, animals did not 
think, but instead they showed transfer of 
training mediated by chains or organizations 
of “response-produced cues.” Animals did not 
“expect,” but instead they maintained their 
future-orientations by virtue of response-pro- 
duced cues which served to bridge time. Im- 
plicit anticipatory goal responses became a 
major conceptual tool of the Hullians, while 
differential proprioceptive and _ kinesthetic 
cues carried the Skinnerian subjects over 
their toughest problem solving challenges. 
Of course, many of the old theoretical contro- 
versies revolved around the validity of these 
peripheralistic positions. 

Harlow and Stagner (1933), following a 


processes. 
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suggestion from Donald G. Marquis, were 
the first to see the importance of a paralyzing 
drug in hitting at the crux of the peripheral 
ist-centralist controversies, whether the con- 
troversies concerned the motor theory of 
consciousness or the latent learning concept 
and its challenge to peripheralistic learning 
theory. These experimenters argued that, if 
the motor theory of consciousness were cor 
rect, conditioning in paralyzed animals should 
not transfer to the normal since the 
conditioning experience would be ineffective 
They further pointed out that if discrimina 
tion learning were a function of the differen- 
tial reinforcement of overt skeletal acts, then 


state, 


conditioning in paralyzed animals should not 


state, since there 


responses in the 


transfer to the normal 
would be no differential 
paralyzed state. 

Harlow and Stagner used the drug curare 
in their experiments. Curare produced a 
complete flaccid paralysis in animal subjects 
Its action was at that time thought to consist 
mainly of chemical blocking at the junction 
of the motor axon and muscle cell. So cu- 
rare could remove skeletal motor responses 
and their proprioceptive feedback and create 
a preparation in which both the motor theory 
of consciousness and the theory of latent 
learning could be tested. Therefore, the ex- 
periments by Harlow and Stagner (1933) 
were considered to be crucial. Harlow and 
Stagner studied both instrumental avoidance 
learning and pupillary conditioning in cura- 
rized cats and dogs. In one experiment, they 
first pretested subjects for aversion to the fol- 
lowing stimuli in the experimental room: a 
chair, bell, light, and buzzer. Then the sub- 
jects were put on a metallic grid, and the 
latency of jumping off it, in response to the 
various pretested stimuli, was measured. 
Finally, all subjects were curarized to the 
stage where there was “absence of muscle 
twitches, gag reflex, and the corneal eye-lid 
reflex” (Harlow & Stagner, 1933, p. 287). 
Then followed 30 pairings of conditioned 
stimulus and electric shock to the grid on 
which the subjects had been placed. Under 
such treatment a normal, undrugged control 
group learned to jump off the grid in re- 
sponse to the conditioned stimulus (CS) in 
three trials or less. When the curarized 
subjects had recovered from the effects of the 
drug, they were reintroduced to the grid, 
presented with the CS, but showed not the 
slightest evidence of learning. 

In a second experiment, Harlow and Stag- 
ner presented in sequence to curarized sub- 
jects a light, a bell, and a shock. The pupil 
initially contracted to the light, dilated to the 
shock, but showed no change to the bell. But 
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after two to six presentations of the three- 
stimulus sequence to curarized subjects, the 
pupillary dilation, originally elicited by shock, 
became elicitable by the bell alone. Said the 
experimenters (Harlow & Stagner, 1933) : 
It thus appears certain that cortical activ- 
ity (to the extent required for the acquisi- 
tion of the conditioned pupillary response ) 
is not depressed by curare, at least in the 
amounts used in this experiment (p. 289). 


Then two extinction experiments were car- 
ried out. Three cats were trained to leap 
from the electrified grid in response to a CS. 
Later, when curarized and placed on the grid, 
they were presented with the CS 
When the subjects recovered from curariza- 
tion, and were placed on the grid again, 
they leaped from the grid with short re- 
sponse latencies. In another group, pupillary 
dilation was conditioned while the subjects 
were in the normal state. Then, while cura- 
rized, the CS was presented from 100 to 150 
times without the shock, and the conditioned 
dilation disappeared. Yet skeletal avoidance 
responses were later shown to have remained 
intact in these same subjects. Harlow and 
Stagner felt that they had established the 
fact that adaptive behavior is not learned 
under curare. However, they asked whether 
it might be possible that CS-US pairings un- 
der curare might have central effects which 
aren't organized in such a way so as to pro- 
duce a specific response. A _ savings test 
might reveal this. Therefore, two control 
puppies were given 30 CS-US 
under curare, then trained again later in the 
normal state. They were compared to two 
puppies which were trained only in the nor- 
mal state. There were no differences be- 
tween the two pairs of subjects in either 
acquisition or extinction rates. In another 
experiment, conditioned forelimb flexion was 
used (more limited and segmental response 
than jumping off the grid), but it revealed 
no learning under curare. The conclusion 
reached by Harlow and Stagner (1933) was 
this: “. .. it appears that presentation of 
stimuli alone will not cause learning if no 
reaction is made” (p. 293). The conclusion 
hit hard at latent learning theory. 

In direct contradiction to the opinion 
of Harlow and Stagner, Light and Gantt 
(1936), using a different method, concluded 
that “. . . the peripheral nerve and executor 
organ are not necessary for conditioned 
reflex formation” (p. 35). In four dogs the 
right hind leg was paralyzed by crushing the 
anterior nerve roots. Before regeneration of 
the injured nerves, simple conditioning was 
carried out on the paralyzed side. A CS was 


alone 


sequences 
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paired with shock to the immobile leg; when 
a diffuse conditioned response (CR), con- 
sisting of howling and struggling, was estab- 
lished, training was stopped. Later, after 
regeneration of the damaged motor nerves, 
the CS was presented, (Light & Gantt, 1936) 
“and it was followed by withdrawal of the 
formerly paralyzed leg—the appropriate and 
specific conditioned movement, but one which 
was never possible during the period of train- 
ing” (p. 35). At the time it was certainly 
possible that the Harlow and Stagner experi- 
ment was not inconsistent with this result 
After all, Light and Gantt had used a classi- 
cal conditioning and this pro 
cedure had produced pupillary conditioning 
in the immobilized state for Harlow and 
Stagner. (The psycho-pharmacological prop- 
erties of the curare used by Harlow and 
Stagner had not yet come under scrutiny.) 
Then, too, the Crisler (1930) findings had 
indicated that conditioning could 
occur in the case of a blocked salivary re 
action. Perhaps the distinction that would 
resolve the discrepancy was that between 
classical conditioning and instrumental train 
ing. 
Stanger applied only to the latter procedure? 

Settlage (1936), using a Pavlovian delayed 
conditioning procedure (CS-US interval of 
2 seconds), attempted to condition 12 cats 
under sodium amytal. The drug level was 
adjusted so that when the CS was presented, 
the subjects gave no anticipatory limb with- 
drawals. Then conditioning pairings of CS 
and shock were given, ranging from 10 to 50 
in number from stimulus to stimulus. Later, 
in the normal state the CS was presented 
alone. A high percentage of limb with- 
drawals was noted. Settlage felt that a new 
S-R connection was established under sodium 
amytal. However, this finding was not 
critical for a peripheral-motor learning po- 
sition because, while under amytal the un- 
conditioned stimulus (US) was often capable 
of eliciting a limb withdrawal, even though 
the CS was not. It was, however, an inter- 
esting and instructive finding. Here was a 
case where the organism was in a state such 
that an unconditioned response (UR) could 
be elicited yet no CR could emerge; however, 
the CR was later shown to have developed 
normally under sodium amytal immobiliza- 
tion, since it occurred when the CS 
presented to the animals after they had re- 
covered from drug effects. 


procedure > 


classical 


Perhaps the conclusion of Harlow and 


was 


The psycho-pharmocological complications 
of the use of curare were now to come under 
scrutiny in a puzzling series of experiments, 
many by Girden and his collaborators. The 
first of these (Girden & Culler, 1937) demon- 
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strated certain disturbing side effects of the 
drug on the function of the brain. The semi- 
tendinosus muscle of the dog was dissected 
out, leaving its circulation and innervation 
intact. Then the subject was presented with a 
2-second bell followed by a 0.5-second shock 
to the paw of the prepared leg while the sub- 
ject was totally curarized. However, the US 
did elicit a twitch of the semitendinosus 
muscle, even though it did not produce a limb 
movement. A good semitendinosus muscle 
twitch CR developed in response to presenta- 
tion of the CS under curare. Yet this CR 
vanished when the dogs were allowed to re- 
cover from the effects of curare. The CR 
reappeared when the dogs were curarized 
again. When a group of dogs was condi- 
tioned in the normal state, then tested under 
curare, there was no detectable CR of the 
semitendinosus muscle. Yet the CR reap- 
peared when the effects of curare wore off. 
Chere thus was a lack of transferability of 
conditioning from one state to the other. 
Girden and Culler (1937) attributed the find- 
ings to a separation of cortical and sub- 
cortical conditioning processes; they said: 


It is thus conceivable that under curare the 
normal cortical dominance is inhibited, and 
that conditioning therefore occurs at sub- 
cortical levels (thalamus). When the ani- 
mal revives, the cortex again functions 
normally and the (conditioned) thalamic 
activities are inhibited. Likewise the CR 
established in the normal animal (with 
participation of the cortex) is inhibited 
under curare (due to general inhibition of 
the cortex) (p. 272). 


Whether or not this interpretation would 
endure, it was, however, certain that the 
curare used in these experiments produced 
complicated central nervous system effects ; it 
was not merely a peripheral-motor blocking 
agent. 

Harlow and Settlage (1939) designed an 
experiment to detect the effect of curarization 
of the upper body upon the ability of cats to 
display previously learned responses of the 
lower body. The subjects’ aortas were pre- 
pared so that they could be tied off in the 
abdominal region. Then the subjects were 
conditioned to give vigorous hind leg with- 
drawals, using a shock as US, and a bell as 
CS. After conditioning, the aortas were 
clamped off and curare was injected into the 
upper body. The upper body parts became 
paralyzed, while the lower body parts could 
move until they became cyanotic due to the 
occluded blood supply. Vigorous CRs per- 
sisted for short periods, perhaps no longer 
than 3 minutes after curarization. It was 
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not possible to tell whether the disappear- 
ance of CRs was due to collateral circulation 
of curare to the legs or to lack of adequate 
blood supply. The experimenters felt that 
they had demonstrated that curare does not 
immediately affect the CNS, at least not for 
3 minutes following injection. 

Culler, Coakley, Shurrager, and Ades 
(1939) bared the spinal roots and the cere- 
bral cortices of nine dogs. They located the 
spinal root locus and cerebral cortex locus 
elicitation of twitches of the 
Then they curarized 
the dogs and measured the stimulation inten 
sities necessary to produce a constant magni- 
tude of excursion of the semitendingsus 
muscle. They found that when they com 
pared these stimulation intensities for cura 
to normal dogs, a striking 
result emerged. Curare elevated the excita- 
tion well as the rheobase for the 
cortex, while there was almost no effect, and 
perhaps the opposite, at the ventral root 
The experimenters felt that somewhere be- 
tween the cortex and the ventral spinal root 
lies a plane of neural cleavage above which 
curare is a depressant. While this might 
explain the findings of Girden and Culler 
(1937), it still did not agree accurately with 
the observations of Harlow and Stagner 
(1933) on the successful development of 
pupillary conditioning under curare, unless 
one would assume further that pupillary con- 
ditioning occurs below the postulated plane 
of neural cleavage 

Harlow accepted the conclusion that cu- 
rare was a CNS depressant. However, he 
felt that if dosage could be adequately ad- 
justed, one might produce sufficient immo- 
bilization without severe CNS side effects. 
Therefore, in an extensive series of experi- 
ments, he employed what he called “incom- 
plete curarization.” Using 11 cats that were 
incompletely paralyzed, he established con 
ditioned escape responses (with a 4-second 
CS and 1-second shock) with some difficulty. 
In contrast, conditioned pupillary reactions 
developed quickly. Later in the normal 
state, the conditioned escape responses were 
vigorous, as were the pupillary reactions. 
Harlow felt that curare is a CNS depressant ; 
but in light doses it permits transferable CRs 
to be formed. He (Harlow, 1940) further 
stated that a failure of learning to proceed 
normally under curare was due to the cere- 
bral depressant effects. “Thus the data from 
the curare experiments cannot be used to 
support the motor theory of learning, and 
clearly refute such a theory if it is to be 
applied to all learning phenomena” (p. 281). 
This conclusion was not very convincing 


for electrical 
semitendinosus muscle 


rized compared 


time as 
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when applied to learned skeletal responses. 
After all, Harlow’s subjects had been able 
to perform escape responses during the cu- 
rarization period. 

Then followed as persevering and careful 
a series of experiments as one can find in 
modern experimental psychology 
span of years from about 1938 to 1947, Ed- 
ward Girden elucidated the behavioral prop- 
erties of the curare drugs then 
(curare and erythroidine) ; our knowledge 
of “dissociation” under curare from 
his work. Girden (1940) performed a bilat- 
eral surgical extirpation of the cortical audi- 
tory areas of dogs. After recovery from the 
aftereffects of surgery, the semitendinosus 
muscle was exposed. Then the dogs were 
subjected to a conditioning procedure iden- 
tical to that used in Girden’s 1937 experi- 
ment (Girden & Culler, 1937), using an 
auditory CS and shock US. Dogs condi- 
tioned while curarized gave good CRs later 
in the normal state; and dogs conditioned in 
the normal state gave CRs later when cu- 
rarized. This outcome was in direct contrast 
to that of the earlier experiment. Here is 
what Girden (1940) thought the results 
meant: 


Over a 


available 


comes 


It is clear from the present results that 


conditioned responses of the semitendino- 
sus muscle to an auditory stimulus in the 
normal animal are mediated by cortical 


pathways, whereas the CRs developed 
under curare involve subcortical systems. 
When a section of the cortical pathway is 
literally extirpated, the block between the 
normal and curare states is disrupted; the 
CRs established in either of the two con- 
ditions will now be manifested in the other 
state (p. 404). 


Girden voiced the that normal 
animals conditioned under curare are prob- 
ably unconscious. Furthermore, he (Girden, 
1940) entertained the idea that even in his 
dogs with removed auditory cortex the same 
condition might have existed. 


conviction 


In the partial decorticate, the amnesic con- 
dition is disrupted so that the curare CR 
now appears in the subsequent normal 
period. But the animal still appears to be 
unaware of what had occurred under the 
drug. That is, the animal does not mani- 
fest the diffuse struggle behavior ordi- 
narily present during the first stages of 
conditioning. . . . It seems as if the animal 
is still unaware of what had happened 
under the drug, the semitendinosus CR 
occurring much like the motor automa- 
tisms reported in hysterics (pp. 405-406). 
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If this conclusion were true, then the useful- 
ness of curare, for the investigation of learn- 
ing without skeletal responding, would cer- 
tainly be nil. 

Girden (1942a) used intact dogs rather 
than the excised semitendinosus preparation 
In addition to the old curare drug, he used 
erythroidine, a variant of curare. Some 
subjects were conditioned to give defensive 
limb withdrawals to a CS while in the nor- 
mal state. They were later tested in the 
drugged state and no retention. 
Some subjects were conditioned by CS-US 
pairings while drugged and were later tested 
while normal. These showed no retention 
either. Even when the dosages of curare or 
erythroidine were light, permitting gross 
bodily responses to the US during condition- 
ing in the drugged state, there was no reten- 
tion in the normal state. This dissociation 
phenomenon was likened to amnesia. 

Girden (1942b) then extended his studies 
to the conditioning of visceral reactions. A 
glass canula was inserted in the common 
carotid artery of the dog for purposes of 
blood pressure recording. Using a light 
flash as CS and shock to forepaw as US, 
and paralyzing the dogs with erythroidine, 
Girden showed that a blood pressure increase 
could be readily conditioned, extinguished, 
and reconditioned. However, the retention 
of a blood pressure CR in the normal state 
was not measured. On this omission, Girden 
(1942b) said: “It is safe to infer, however, 
that the blood pressure CR, developed in the 
drug state, was suppressed after recovery 
as were the other components of the drug 
state CR which have been studied” (p. 230). 
This conclusion was based on the fact that 
Girden’s dogs, after recovery from drug 
effects, did not respond to the CS with 
startle, respiration changes, or pulse rate 


showed 


changes. 

In view of the fact that Harlow and 
Stagner (1933) had believed that the con- 
ditioned pupillary reflex, established under 
curare, was retained later in the undrugged 
state, the question of dissociation for vis- 
ceral or autonomic CRs was certainly not 
settled yet. Therefore, Girden (1942c) re- 
turned to the study of the conditioned pupil- 
lary reaction. In dogs immobilized by 
either curare or erythroidine, a conditioned 
pupillary dilation was established, using a 
bell CS and shock to forepaw as US. The 
conditioned reaction was shown to be due 
to sympathetic innervation, and not due to 
inhibition of parasympathetic innervation, 
since this CR failed to develop if sym- 
pathetic fibers to the pupil were cut. The 
pupillary CR was readily extinguishable in 
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the drug state. But the CRs established in 
the drug state did not transfer to the later 
normal state. So Girden (1942c) stated: 
“We cannot concur, therefore, with the con- 
clusion that ‘curare produces different in- 
fluences upon the various components of a 
total conditioned response pattern’” (p. 
331). He was in complete disagreement 
with Harlow and Stagner (1933) on this 
point. In addition, Girden, in order to main- 
tain that the dissociation phenomenon was 
due to a block between cortical and sub- 
cortical functions, would have had to as- 
sume that pupillary conditioning in the 
normal state was cortically mediated. He 
was, however, silent on this point at the 
time. Later, Girden (1943a) experimented 
with different levels of erythroidine dosage, 
producing mild to very deep paralysis in 
dogs. He found that autonomic CRs were 
established under erythroidine but were not 
detectable later in the normal state, no 
matter how deep the flaccid paralysis of 
striate muscles had been. He further found 
that only those striate muscle reactions ac- 
tually made under erythroidine, can become 
conditioned. When they do, they are part 
of a generalized “excited” emotional reac- 
tion pattern. Girden conjectured that a 


human subject, rendered partially immobile 
by erythroidine, would be unconscious; but 


later on in the normal state, any autonomic 
CRs established in the drug state would 
be rearoused. The subject would perceive 
his own autonomic upset without knowing 
how it had developed. Such autonomic CRs 
would only transfer to the normal state if 
they were artifacts of a general skeletal 
emotional response pattern which had oc- 
curred under light paralysis. Thus Girden 
tried to tie together autonomic and skeletal 
CRs into an emotional reaction pattern. 
The autonomic reactions are side effects of 
the skeletal emotional responses. With deep 
paralysis, therefore, autonomic CRs would 
not later transfer to the normal state. 

In the same year, Girden (1943b) showed 
that the EEG in rhesus monkeys and dogs 
remained normal under flaccid paralysis in- 
duced by erythroidine. This was true even 
after repeated dosages of erythroidine had 
immobilized subjects for as long as 4 hours. 
This finding put in doubt the role of cortical 
depression in the dissociation phenomenon. 
3ut it did show that artificial respiration, 
required in deeply curarized subjects, was 
adequate for normal brain function in this 
and previous experiments in Girden’s labora- 
tory. This was no trivial finding, in view of 
the peculiar amnesias found with curare im- 
mobilization. Finally, Girden (1947) ex- 
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tended his studies to rhesus monkeys. Using 
erythroidine, four subjects were avoidance 
trained under light dosages. Using both 
visual and acoustic CSs, CRs rapidly de- 
veloped under erythroidine. Girden (1947) 
reported that such CRs often developed 
more rapidly in the drug state than in the 
normal state. 


As previously observed in the dog, there 
is a functional dissociation in the curarized 
monkey with the result that drug state 
learning is completely repressed upon com- 
plete recovery from the drug. The CR 
appears spontaneously when the monkey 
is again curarized (p. 587). 
The development of autonomic CRs was 
normal under erythroidine, but there was 
no transfer to the normal state. We wish 
to emphasize the striking finding that dis- 
crete, localized skeletal CRs, as well as 
autonomic CRs, were found to develop faster 
in the drug state than in the normal state. 
Girden had previously noted this in dogs, 
and had commented upon it. In one study, 
using the semitendinosus response, 17 sub 
jects had reached an arbitrary conditioning 
criterion under curare in 23 to 125 trials 
In the normal state, another group of sub- 


jects had required 100 to 250 trials to reach 


criterion. The observations on monkeys 
confirmed this finding. 

Morgan, (1951) in tracing the history of 
the use of curare and erythroidine in be- 
havioral experiments, accepted Girden’s gen- 
eral conclusions (Morgan, 1951): “Thus the 
story is now simple and clear. Conditioning 
reactions, whether skeletal or autonomic, 
established under curare or erythroidine, do 
not carry over to the normal state and vice 
versa” (p. 771). Clearly these two drugs 
could not be legitimately used for the orig 
inal purpose of the Harlow and Stagner 
experiment. If one wished to study the im 
portance of the peripheral skeletal response 
in learning, other immobilizing techniques 
would be required. 

It turned out that Girden stopped his ex- 
perimentation on curare drugs too soon. A 
new curare derivative, d-tubocurarine, had 
been administered to a human subject for 
experimental purposes in 1945. The object 
was to test its efficacy as an anesthetic. For 
several years prior to this time, both curare 
and erythroidine had been used as an anes- 
thetic supplement for surgery. Sometimes 
they had been used alone for surgery. They 
were effective relaxants, and as 
such were an aid to surgical procedures. 


muscular 
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However, when d-tubocurarine was used 
alone as an anesthetic, sometimes the pa- 
tient would complain after the operation 
that he had experienced severe pain. This 
had not happened with curare or erythroi- 
dine. Therefore, Smith, Brown, Toman, and 
Goodman (1947) used the new paralyzing 
drug, d-tubocurarine, on Smith to see 
whether it had anesthetic or cerebral de- 
pressant properties. Smith was given a dose 
large enough to induce total flaccid paraly- 
sis. He was artificially respirated. The 
other three experimenters presented their 
colleague with a wide variety of external 
stimuli, some painful, while he was immo- 
bile. After the subject recovered he was 
able to relate in accurate detail most of 
what had transpired. He said his “senses” 
were extremely “clear” under d-tubocura- 
rine. His blood pressure remained normal 
during the procedure, and except for the 
discomfort of needing to cough and being 
unable to do so, suffered ‘no ill effects. This 
outcome, the authors stated, agrees with the 
finding of Girden (1948) that curare drugs 
do not depress the EEG as long as arti- 
ficial respiration is adequate. It is not clear 
why Smith et al. (1947) Girden’s 
work on dissociation while accepting that 
on the EEG. They state that Girden, em- 
ploying CR techniques with animals, be- 
lieved his results to indicate that curare 
causes unconsciousness or amnesia; but they 
add that his evidence is not very conclusive. 
On the contrary, we would feel that his 
evidence was good, and it agreed with the 
clinical findings on the old curare as an 
anesthetic agent. Clearly, d-tubocurarine 
was quite different from curare and erythroi- 
dine in its psychological action. 

The Smith et al. (1947) findings have 
been partially confirmed by perceptual tests 
with human subjects. McIntyre, Bennett, 
and Hamilton (1951) found that doses of 
d-tubocurarine left CNS unaf- 
fected. Unna and Pelikan (1951) found, 
with human subjects (six healthy volun- 
teers) that small, subparalytic doses of d- 
tubocurarine did not alter a wide varicty 
of perceptual functions. They (Unna & 
Pelikan, 1951) stated: “No evidence was 
obtained of any action other than on the 
neuromuscular junction. ...In_ particular, 
no effects on autonomic organs and 
none on cerebral functions could be demon- 
strated .. .” (p. 480). Lauer (1951) dem- 
onstrated the transfer of learned forelimb 
flexion in two dogs from the immobile state 
to the normal state, using d-tubocurarine to 
produce complete paralysis. Beck and Doty 


ignore 


processes 


also 
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(1957) have confirmed the same type of 
finding in cats, using a cataleptic compound, 
bulbocapnine. In an extensive monograph 
now in press, Black, Carlson, and Solomon 
(1962) have demonstrated several related 
phenomena of autonomic conditioning in 
dogs paralyzed by curare. They have con- 
firmed the fact that the more recent curare 
drugs (flaxedil, d-tubocurarine) produce a 
complete flaccid skeletal paralysis without 
the dissociative or amnesic effects produced 
by the curare drugs used previously in the 
pioneering work by Harlow and Stagner 
(1933), Girden (1940, 1942a, 1942b), and 
others. The classical conditioning of ANS 
reactions in curarized dogs has been shown 
to transfer to conditioned reflexes in the 
normal, undrugged state. In addition, the 
development of ANS discriminative, condi- 
tioned reactions has been demonstrated in 
curarized dogs. This means that ANS con- 
ditioning is not merely an artifact of skeletal 
responding or its proprioceptive feedback ; 
it means also that the original experimental 
idea of Harlow and Stagner can now be 
carried out as a satisfactory test of the 
“central process” versus “peripheral process” 
interpretations of learning and transfer. 


METHOD 


Subjects 


Subjects were six healthy mongrel dogs 
obtained from the Harvard Medical School 
animal farm. They were of medium size 
(10-12 kilograms). Their previous histories 
were unknown because they were strays. 


Apparatus 


The apparatus has been described in some 
detail by Black (1958). A Pavlov-type 
harness held the subject in a comfortable, 
suspended posture. Two panels were placed 
on either side of the subject’s head. The 
stimulus light (S°) was placed 1 foot above 
and 4 feet in front of that subject’s head. 
The light was provided by a 40-watt bulb. 
The tones (S*+ and S-) were provided by 
two Bud Code-practice oscillators, set at 
approximately 160 cycles per second and 
1200 cycles per second. They were placed 
4 feet in front of, and 2 feet above the dog’s 
head. The amplitude was set at a level that 
seemed nonaversive to the experimenters. 

Appropriate interval timers, relays, and 
programming equipment were used to con- 
trol the CS-US interval, US duration, and 
intertrial intervals. The accuracy of the 
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CS-US interval timer was 10 seconds + 0.05 
second. The accuracy of the shock dura- 
tion timer was 5 seconds + 0.02 second. 

The EKG was recorded on a Grass four- 
channel polygraph. Alligator clip  elec- 
trodes, attached to shaved skin areas, were 
used to pick up EKG impulses. The shock 
electrodes were those described by Black 
(1958); they were brass plates embedded 
in Bakelite, and they were laced tightly to 
the hind toe pads. 


Procedure 


Stage one—avoidance training. The sub- 
jects were placed in the harness, EKG and 
shock electrodes were attached, and the 
panels were adjusted so that. each dog could 
readily press either panel:' The experi- 
menters then left the experimental room. 
Lights were turned off (with the exception 
of the signal light and one dim 5-watt lamp 
which hung above and behind the subject). 
The signal light shone directly on the sub- 
ject’s face. In the door to the experimental 
room was a one-way mirror through which 
the experimenters observed the subjects. 

When the initial struggling period seemed 
over and the subject was calm, training was 
begun. At least three test trials were given 
at the outset, in order to be sure that there 
was no initial panel-pressing response to the 
S° (light out). Then the S° was paired 
with shock, with a CS-US interval of 10 
seconds. Shock stayed on until the subject 
pressed either panel, at which moment S° 
and shock were simultaneously terminated. 
The intertrial intervals were either 1, 14, 
or 2 minutes, presented in random order. 
Stimulus presentation, trial presentation, re- 
cording of responses, as well as control of 
the S° and shock onset, were automated so 
that the subject could be left unattended. 
If the subject pressed a panel in less than 
10 seconds following CS onset, then S° was 
instantaneously removed and no shock was 
presented. The subjects were trained until 
they gave reliable avoidance responding. 
Pressing either panel prevented shock. A 
criterion of at least 20 consecutive avoid- 
ance responses was required, but each sub- 
ject was trained to a different criterion. 
The differing avoidance criteria were used 
in order to vary the number of trials since 
the last shock was received prior to curare- 
conditioning. Then the subjects were tested 
for sensitization by presentations of the two 
stimuli which were to be used in the next 
stage of the experiment. A 160 cycles per 
second tone and a 1200 cycles per second 
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tone were presented three times each, in a 
random order. 

Stage two—conditioning 
After the subject met the avoidance cri- 
terion, he was curarized. D-tubocurarine 
was administered i.e., with a 
least 0.7 cubic centimeter per kilogram, or 
2.0 milligrams per kilogram, needed to en- 
sure total flaccid paralysis. A gradual in- 
jection procedure was used, starting with 
an initial dose of 6.0 milligrams, and there- 
after injecting 0.3 milligram per minute 
When respiration stopped, usually 30 sec- 
onds after an initial 6.0 milligrams dose, an 
endotracheal tube inserted, and arti- 
ficial respiration was begun. 

Conditioning was started when the subject 
showed no respiratory movements, no cor- 
neal lid reflex, no ear or tongue twitches, 
and no paw movements. Conditioning trials 
were carried out in the Pavlovian manner. 
A delayed conditioning procedure was used 
with a 10-second CS-US interval and a 
shock of 5 seconds duration and of 4 milli- 
amperes intensity. Thus the CS lasted 15 
seconds, and the US occurred during the 
last 5 seconds of this interval. The CS was 
terminated at the moment of shock termina- 
tion. Two stimuli were used: S*, a tone 
paired with shock, and S, a tone never 
paired with shock. The 160 cycles per 
second tone was S*+ for dogs number 21, 22, 
23, and 25. The 1200 cycles per second tone 
was S* for dogs number 32 and 33. 

All six subjects were given 99 curare- 
conditioning trials. The following sequence, 
+++ —+4+-4+-4+--—-+--—-—, was 
repeated five times and then was followed 
by a sequence of 19 alternated + and — 
trials, ending on the 99th trial with a+ 
trial. 

Stage three—transfer tests. After the 
curare-conditioning session, all subjects were 
given at least 48 hours to recover from the 
side effects of curarization. Then they 
were tested in the normal state in the same 
apparatus as that used previously for both 
avoidance training and curare-conditioning. 
All three CSs were presented, and the be- 
havior of the six subjects in response to 
these stimuli was recorded. No shocks were 
given, so the test sessions were extinction 
sessions. 

The order of test presentations of the 
three types of CSs presented serious prob- 
lems of potential biasing. While there were 
no shocks presented. on these test trials, 
there was the possibility that order effects 
could either enhance or diminish the panel- 
pressing responses to the three CSs. For 
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example, if the subject were at first pre- 
sented with a series of S~ presentations 
and did not respond, there was the 
possibility that responses to the S* or S° 
might become less probable due to some 
type of generalization of inhibition. Or, 
perhaps, if a long series of S® trials were 
presented, this might facilitate responding 
to S” and S* due to some type of general- 
ization of excitation or a sensitization (or 
pseudoconditioning) phenomenon. We were 
not certain how to solve this problem with- 
out a very large number of subjects. There- 
fore, we explored different orders of CS 
presentation with different subjects. The 
purpose was to eliminate biasing for each 
subject, in order to determine whether 
the three CSs had differential elicitation 
strengths. 

Latencies of the panel-pressing responses 
were measured from CS onset to the time 
the panel was depressed 1 inch. The CS 
was arbitrarily terminated if the subject 
did not press the panel in 15 seconds (the 
time interval initially required for the 10 
second CS-US interval plus the 
shock duration period). If no response was 
made in 15 seconds, it was recorded as an 
infinite latency. It was known from previ- 
ous studies that the termination of the CS 
after a short time period facilitates extinc- 
tion. Therefore, we were using conservative 
conditions for the demonstration of response 
differentiation. 


5-second 


RESULTS 


In Table 1 are listed the total num- 
ber of shocks received by each sub- 
ject during avoidance training and the 
number of consecutive avoidance re- 


TABLE 1 


SHO€KS RECEIVED, AND RESPONSES TO REACH 
CRITERION, FOR ALL SUBJECTS 


Consecutive 


voidance 
| of shocks a . anc 
| received responses 
to criterion 


Subject's Number 
code 


number 


Name of 
subject 


21 Harry 
22 Bugeye 
23 Buff 

25 Nipper 
32 Punc 
33 Mahog 
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sponses made in reaching the arbitrary 
avoidance learning criterion. 

The latencies of panel-pressing are 
the more important observations of this 
experiment; these are plotted in Fig- 
ures 1-6 for the last 20 training trials 
prior to curare-conditioning and for 
the test trials after recovery from cu- 
rare effects. The data on sensitization 
tests are not plotted since all these 
pretest responses to S* and S” were 
of infinite latency. In Figures 1-6 re- 
are indexed by filled 
circles; responses to S* are indexed 
by closed triangles, and responses to 
S~ are indexed by open triangles. On 
the ordinate is the reciprocal of latency 
(x 100) on a logarithmic scale. The 
arbitrary infinite latency (no response 
in 15 seconds) is at the bottom of the 


sponses to S° 


ordinate scale. 
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It is convenient for us to describe 
our results by individual subjects. All 
subjects lead us to the same general 
conclusions but show slightly differing 
phenomena. 

Subject #21, shown in Figure 1, on 
the last 20 trials before curare-condi- 
tioning, showed normally stable avoid- 
ance responding to the original training 
stimulus, S°. When tested on S* and 
S- before the  curare-conditioning 
trials, he gave Like 
most subjects, he pricked up his ears, 
looked around, but made no incipient 
panel-presses. After curare-condition- 
ing, we can see that there still was no 
response made to the S~. On the other 
hand, a few responses appeared to S’*, 
and these were of short latency. The 
latency of responses to S° were of the 
same order of magnitude as were those 


no responses. 
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Fic. 1. Transfer effects in Dog #21. 


Latencies of panel-pressing responses to the original 


training stimulus, S°, and to the two stimuli, S* and S~, used in the curare-conditioning pro- 


cedure. 
very little differentiation of S* and S~.) 


( Note that the latencies are expressed in reciprocal form. 
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Fic. 2. Transfer effects in Dog #22. 
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Latencies of panel-pressing responses to the original 


training stimulus, S°, and to the two stimuli, S* and S~ used in the curare-conditioning pro- 


cedure. 


( Note that the latencies are expressed in reciprocal form. 


Note that this dog shows 


very little differentiation of S* and S~ on early test trials, but there is clear evidence of more 
rapid extinction of panel-pressing to S~ than to S*.) 


prior to curare-conditioning. One re- 
sponse made to S~ in 35 tests. 
Ten responses were made to S* in 42 
tests. We felt that the presentation 
of 10 S~ tests at the outset inhibited 
the responding to S* on later tests. 
We were especially struck by the fact 
that 30 subsequent test trials on S° 
led to a facilitation of panel-pressing to 
S*. Following these 30 S° trials, there 
were 7 responses in 18 S* trials and 1 
response in 15 S~ trials. The results 
on this animal were promising enough 
to lead us to replicate the experiment 
on other subjects. 

Subject #22, shown in Figure 2, 
achieved stable avoidance responding 
prior to curare-conditioning. When 
tested with S~ and S* prior to curare- 
conditioning, he showed no_ panel- 
presses, indicating no sensitization ef- 
fect at that stage of the experiment. 
Following curare-conditioning, how- 
ever, this subject pressed the panel in 
response to all three CSs. But after 


was 


approximately 40 test trials, the la- 
tencies of avoidance responding to S 
were clearly more variable than those 


to S*. After approximately 80 test 
trials, the response latencies to S~ were 
significantly longer than those to S*, 
and there were several failures to re- 
spond to S~. If we look at Test Trials 
110-160, there can be no doubt that 
Subject #22 was responding strongly 
to S* but very poorly or not at all to 
S-. There were nine failures to re- 
on Test Trials 110-137. 
On Test Trials 138-163, there were 
no failures to respond to S*. Interest- 
ingly enough, when finally tested on 
S°, on which Subject #22 had origi- 
nally been trained, there were 13 fail- 
ures to respond in 17 tests. 

Subject #23, shown in Figure 3, 
gave stable avoidance responding to 
S° prior to curare-conditioning. When 
tested for sensitization on S* and S 
prior to curare-conditioning, he gave 
no responses. Following curare-con- 


spond to S 
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Fic. 3. Transfer effects in Dog #23. Latencies of panel-pressing responses to the original 
training stimulus, S°, and to the two stimuli, S* and S~ used in the curare-conditioning pro- 
cedure. (Note that the latencies are expressed in reciprocal form. Note that this dog 
shows excellent differentiation of S* and S~ at the beginning of transfer tests, and that 
responses to S* extinguish rapidly.) 
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Fic. 4. Transfer effects in Dog #25. Latencies of panel-pressing responses to the original 
training stimulus, S°, and to the two stimuli, S* and S~ used in the curare-conditioning pro- 
cedure. (Note that the latencies are expressed in reciprocal form. Note that the responses 
to S~ extinguish quickly while those to S* remain stable.) 
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Latencies of panel-pressing responses to the original 
and to the two stimuli, S* and S 
(Note that the latencies are expressed in reciprocal form. 
differentiation of S* and S~ only after some tests on S° are given. 


used in the curare-conditioning pro- 
Note that there is good 
Note the long latencies of 


responses to S~ as contrasted with the quick responses to S*.) 


ditioning Subject #23 was first tested 
on S~ and failed to respond in 10 tests. 
Then, when S* was next presented, he 
responded on all 10 tests, but the 
latencies were longer than those to 
S° had been prior to curare-condition- 
ing. After three failures to respond 
again to S-, there were three long- 
latency responses to S* and then there 
were 10 failures to respond. Extinc- 
tion on S* was not, however, as com- 
plete as on S~. When finally tested 
on the original S°, Subject #23 showed 
stable responding, similar to that 
shown prior to curare-conditioning. 
Such tests on S° seemed to facilitate 
subsequent responding to S* but not 
to S-. There were no responses to S 
during the whole course of testing. 
However, when S° was tested, subse- 
quent tests on S* always yielded a few 
responses followed by _ extinction. 
Finally, the response to S° extin- 
guished to some extent after S® had 


been presented 45 times over Test 
Trials 47-99, On Test Trials 108-122 
there were 11 failures to respond to S°. 
Thus, even when the panel-pressing 
avoidance response was in the process 
of extinguishing, the differentiation be- 
tween S- and S* was striking. 

Subject #25, shown in Figure 4, like 
the other subjects, achieved stable 
avoidance responding prior to curare- 
conditioning. 
tion phenomenon prior to curare-con- 
ditioning when pretested on S~ and S*. 
Subsequent to  curare-conditioning, 
however, there was good evidence of 
differentiation between S- and St. We 
started testing with S~ and obtained im- 
mediate panel-pressing. The latencies 
were variable, however, and many of 
them were longer than those previously 
seen in response to S® prior to the 
curare-conditioning session. In con- 
trast, responding to S* was stable and 
of the same order of magnitude as that 


He showed no sensitiza- 
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obtained on S° prior to curare-condi- 
tioning. There were no failures to 
respond quickly to S*. But there were, 
between Test Trials 1-65, 23 failures 
to respond in 35 presentations of S-. 

We then embarked on a long series 
of tests, extending over three experi- 
mental days, during which we tried to 
extinguish the panel-pressing response. 
We alternated long sequences of S* and 
S°, testing infrequently on S~, hoping 
to weaken the panel-pressing response. 
In Figure 4 we show the results of 
this testing after 420 trials. The re- 
sponse to S° was clearly weaker than 
it had been. The response to S 
still absent. In contrast, the response 
to S* was only slightly weaker than 
it had been about 50 trials earlier. 
Here, then, was a case where curare- 
conditioning, even though the panel- 
pressing response could not be per- 
formed during it, produced a stronger 
avoidance response to S* than that to 
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S®, even though the panel-press re- 
sponses were made originally to S° 
and not to S*! 

Subject #32 is shown in Figure 
He, like the other subjects, achieved 
stable avoidance response latencies prior 
to curare-conditioning. When tested for 
sensitization with S- and S*, he gave 
no panel-pressing responses. After 
curare-conditioning, Subject #32 was 
first tested with S-, and he made no 
panel-pressing responses. Then, when 
tested with S* he again made no re- 
sponses. Finally, when tested with 
the original training stimulus, S°, he 
responded with characteristic short la- 
tencies. Following three quick re- 
sponses to S°, he responded with 
longer latencies to S*. Then there 
were two responses to S~, followed by 
a failure to respond. From Test Trials 
40-76, there were no responses to 16 
presentations of S~; there were 21 re- 
22 tests on S*; and 
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Fic. 6. Transfer effects in Dog #33. Latencies of panel-pressing responses to the original 
training stimulus, S°, and to the two stimuli, S* and S~, used in the curare-conditioning pro- 
cedure. (Note that latencies are expressed in reciprocal form. Note the excellent differenti- 
ation of S* and S~ at the outset of testing. Note the maintenance of this differentiation over 
150 test trials.) 
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there were six responses out of nine 
tests on S°. With repeated testing on 
all three CSs, there was no apparent 
extinction trend for S° or S*, while the 
latencies of response to S~ were either 
long or infinite. Over Test Trials 158- 
246, the subject gave 26 short-latency 
responses to 26 S° presentations; he 
gave 30 short-latency responses to 30 
S* presentations; and in contrast, he 
gave 19 responses in 33 presentations 
of S-, and these were of long latency. 
While there was no significant differ- 
ence between response latencies to S° 
and S*, the latencies to S~ differed sig- 
nificantly from those of S* and S°. 
Here, again, there was clear evidence 
of differentiation in avoidance respond- 
ing to the two discriminative stimuli 
used in curare-conditioning. 

In Figure 6 are shown data for Sub- 
ject #33. His 
prior to curare-conditioning were 
stable and of short latency. When 
tested for sensitization by presenta- 
tions of S- and S*, he gave no panel- 
presses. After the curare-condition- 
ing session, he was first tested on S-~. 
He failed to respond to three such 
presentations. Then, when tested with 
10 successive presentations of S*, he 
responded 10 times ; the first responses 


avoidance responses 


were sluggish, but they increased in 
vigor, and the final tests of this se- 
quence yielded latencies of the same 
magnitude as those characteristic of re- 
sponding to S° prior to curare-condi- 
tioning. Then three more S~ presenta- 
Three S° 
presentations then elicited short-la- 
tency panel-presses. Over the course 
of 173 test stimulus presentations, the 
responses to S* showed little change; 
the responses to S° remained 
stable. In contrast, the responses to 
S- were practically absent; there were 
only 2 panel-presses in a total of 49 
S- presentations, and these were slug- 
gish. Here again, there was clear evi- 
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dence of differentiation between S* 
and S-. 


DISCUSSION 


The results show, in all six subjects, 
that the panel-pressing responses to S* 
were more vigorous, more reliable, less 
variable, and of shorter latency than 
those to S~. Responses to S* were 
also more highly resistent to extinc- 
tion effects than were those to S-. On 
the other hand, the comparison of re- 
sponses to S° and S* does not yield a 
clear conclusion. In some subjects, S* 
was more resistant to extinction effects 
than was S°, and in other subjects, the 
reverse true. We feel that the 
number of shocks received in original 
avoidance training with S° may be 
a factor in producing this variability of 
outcome. Dogs #32 and #33 re- 
ceived the most shocks in training, and 
were carried out to a strict avoidance 
learning criterion, 77 and 88 trials, re- 
spectively. These two subjects had 
the best differentiation between S~ and 
S* in the tests. In addition, the test- 
ing procedure following curare-condi- 
tioning may be important. Presenting 
S° or S*, followed by other CSs, seems 
to elevate the general responding level 
to all CSs. Presenting S~ followed by 
the other CSs, seems to depress the 
general responding level to all CSs. 
And finally, the 1200-cycle tone seemed 
slightly better as an S* than was the 
160-cycle tone. The number of 
jects in our experiment is not great 
enough to prove these points, nor is 
the design of our experiment adequate 
for proof. However, the indications 
are promising enough to pursue fur- 
ther with appropriate experiments. 

The theoretical implications of this 
experiment are clear. Animal subjects 
can acquire discriminative instrumen- 
tal response tendencies without the 
overt responses themselves being ex- 
ercised in the presence of the discrim- 
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inative stimuli, The differential 
strengths of panel-pressing tendencies 
in the presence of S~ and S* could only 
have been acquired during the dis- 
criminative curare-conditioning 
The pretests for sensitization 
showed no generalization from S° 
either to S- or to S* prior to curare- 
conditioning. Following curare-condi- 
tioning (during which S* was paired 
with shock and S~ was presented with- 
out shock pairings), tests on all six 
subjects showed stronger panel-press- 
ing tendencies to S* than to S~. In 
some cases responses to S~ were prac- 
tically absent. Since no panel-pressing 
could occur during curare-condition- 
there could be no temporal pair- 
either of S* and Roress or S~ and 
The differential, discriminative 
panel-pressing which revealed itself 
later must have been due to the Pav- 
lovian procedure of classical, discrimi- 
Thus, one might 


ses- 
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native conditioning. 
ask, what were the mediators of this 
special type of transfer of training or 
thinking? What conditions could have 
led our dogs to reveal, through their 
panel-presses, what they had acquired 
during the conditioning procedure? 
We feel that a systematic exploration 
of these questions could lead to a 
better knowledge of transfer and 
thought. By exclusion, we can defi- 
nitely maintain that differential periph- 
eral skeletal responses (and their asso- 
ciated feedback) could not have been 
a factor in mediating the discrimina- 
have clearly 
demonstrated. Thus, those theories 
of transfer of training and of thinking 
which restrict themselves to peripheral 
skeletal response mediators, and their 
correlated feedback from the periph- 
ery, cannot explain the outcome of our 
experiment. 

One possible mediation process 
could reside in ANS conditioning and 
its correlated feedback from the ANS 


tion learning which we 
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periphery. There was some evidence, 
in a few of our subjects, that there was 
a greater cardiac response to S* than 
to S~ at the end of the curare-condi- 
tioning session. (A detailed account 
of discriminative cardiac conditioning 
in curarized dogs will appear later in 
another paper.) Possibly the differ- 
ential strengths of ANS CRs elicited 
by S* and S~- could serve to mediate 
the discriminative panel-pressing be- 
havior through correlated visceral sen- 
sory feedback processes. If so, we do 
not know the essential conditions for 
such mediation. For example, what 
outcome would be obtained if the sub- 
jects were given discriminative curare- 
conditioning first, then were trained to 
panel-press to S°, and then were tested 
on S~ and S* in the normal state? 
What would have been the outcome 
if the CS-US interval used in avoid- 
ance training to S° were shorter than, 
or longer than, that used in the curare- 
conditioning session when S* 
paired with shock? These questions, 
which definitely suggest a long series 
of parametric studies, could, if an- 
swered, tell us a great deal about con- 
ditions for transfer of training and for 
thinking. For, despite the clear results 
of this experiment, we still don’t have 
a good explanation for the very “in- 
telligent” behavior of our dogs. What 
led them to panel-press to S*? They 
could have merely howled, struggled in 
the harness, or have become “frozen” 
if S* had frightened them, and they 
could have done less of this when S 
was presented. Why did they respond 
appropriately ? 

Our subjects clearly “put together” 
two experiences in such a way that 
two stimuli, never before associated 
directly with lever-pressing, elicited 
differential lever-pressing during the 
transfer tests. The accomplishment of 
this transfer phenomenon did not re- 
quire differential skeletal responding 


was 
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during the differentiation phase. Of 
course there may be kinds of transfer 
effects, or types of thinking, for which 
peripheral skeletal responding during 
differentiation is necessary or impor- 
tant. The type of phenomenon we 


have studied is clearly not one of these. 


SUMMARY 


Dogs were trained to avoid shock 
in response to a signal. 
response was the pressing of a panel, 
the signal (S°) was a light going out, 
the shock was of 4-milliamperes in- 
tensity (applied to the hind toe pads), 
and the time interval between signal 
onset and shock onset was 10 seconds. 
If the correct response occurred dur- 
ing the time interval between signal 
onset and the usual shock presentation 
time, no shock was given and the sig- 
nal was terminated. Intertrial inter- 
vals were varied systematically, with a 
mean of 1.5 minutes. 

After the dogs were reliably pressing 
the panel in response to the signal, 
with response latencies of 3 seconds or 
shorter, they were totally paralyzed by 
curarization. While the dogs were 
thus completely immobilized under 
curare, a Pavlovian discriminative con- 
ditioning session was carried out. A 
new signal (S*) consistently 
paired with shock, using a delayed 
conditioning procedure and a time in- 
terval of 10 seconds between S* onset 
and shock onset. The shock duration 
was 5 seconds. On some trials a con- 
trasting signal (S~) was presented for 
15 seconds, but it was not paired with 
shock. A sequence of 99 discrimina- 
tive conditioning trials was presented, 
ending with an S* trial. The S* and 
S~ trials were partly randomized in a 
special sequence. After this condi- 
tioning session, the dogs were given 
48 hours in which to recover from the 
various physiological side effects of 
curarization. Next they were returned 


The avoidance 
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to the training situation in the normal, 
undrugged state, and the three previ- 
ously used stimuli (S°, S*, and S~) 
were presented. The latency of panel- 
pressing responses was recorded dur- 
ing these tests of the efficacy of the 
three stimuli. 

The dogs responded in a way con- 
sistent with their discriminative Pav- 
lovian conditioning experience under 
curare. There were frequent panel- 
presses in response to S° and S‘*, very 
few in response to S-. When a dog 
pressed a panel in the presence of S-, 
the latency of the response was often 
long. 

The technique used in this experi- 
ment is a special case of training 
wherein the trainee can do nothing 
overtly about the information or stimu- 
lus sequences as they are presented. 
It therefore gives the experimenter 
very precise stimulus control, because 
the usual overt responses of the sub- 
ject are not present to modify the stim- 
ulus situation in unpredictable ways. 

This experiment demonstrates that 
certain types of transfer of training or 
problem solving can occur without the 
benefit of mediation by peripheral skel- 
etal responses or their associated feed- 
back mechanisms. Whether or not 
peripheral ANS reaction mechanisms 
play a role in such transfer phenomena 
remains to be seen, and such a possibil- 
ity should be seriously explored in 
future experiments. 
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The intent of this paper is the pres- 
entation of an associative interpreta- 
tion of the process of creative thinking. 
The explanation is not directed to any 
specific field of application such as art 
or science but attempts to delineate 
processes that underlie all creative 
thought. 

The discussion will take the follow- 
ing form. (a) 
creative thinking in associative terms 
and indicate three ways in which crea- 
tive solutions may be achieved—seren- 
dipity, similarity, and mediation. (5) 
This definition will allow us to deduce 
those individual difference variables 
which will facilitate creative perform- 
ance. (c) Consideration of the defi- 


First, we will define 


nition of the creative process has sug- 
gested an operational statement of the 


definition in the form of a test. The 
test will be briefly described along with 
preliminary research results. 
(d) The paper will conclude with a 
discussion of predictions regarding the 
influence experimentally 
manipulable variables upon the crea- 
tive process. 

Creative individuals and the proc- 
esses by which they manifest their 
creativity have excited a good deal of 


some 


of certain 


of this 
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1 The essence paper was written 
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Psychologist at the Institute of Personality 
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stimulation his paper, “Orientation to 
research on thinking” (1952). Work on this 
material has been supported by the Coopera- 
tive Research Program of the Office of 
Education (Project No. 1073) and the Na- 
tional Science Foundation (Grant G-3855). 
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interest and curiosity. There are ex- 
tended analyses of novels and novel- 
ists, poems and poets, mathematics and 
mathematicians, both biographical and 
autobiographical. Perusal of the intro- 
spections of manifestly creative in- 
dividuals uncovers a surprising vein of 
similarity in the processes they de- 
scribe (Ghiselin, 1952). Thus 
find Albert Einstein’s self-searching to 
suggest that “The psychical entities 
which seem to serve as elements in 
thought are certain signs and more or 
less clear images which can be com- 
bined ... This combinatory 
to be the essential feature in 
productive thought.” Samuel Taylor 
Coleridge is described as having de- 
veloped his ideas in the following man- 
ner : “Facts which sank at intervals out 
of conscious recollection drew togetlier 
beneath the surface through the almost 
chemical affinities of common ele- 
ments.” In the field of art, we find 
André Bréton referring to a collage 
by Ernst as being distinguished by a 
“marvelous capacity to grasp two mu- 
tually distant realities without going 
beyond the field of our experience and 
to draw a spark from the juxtaposi- 
tion.” Most explicit, however, is the 
oft-quoted statement by the mathema- 
tician, Poincaré, who talks about an 
evening when “ideas rose in crowds; | 
felt them collide until pairs interlocked 
so to speak, making a stable combina- 
tion. 


we 


play 


seems 


By next morning I had estab- 
lished the existence of a class of Fuch- 
functions.” From these experi- 
ences, Poincaré felt that he could state 
that “to create consists of making new 
of associative elements 


sian 


combinations 
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which are useful. The mathematical 
facts worthy of being studied . . . are 
those which reveal to us unsuspected 
kinships between other facts well 
known but wrongly believed to be 
strangers to one another. Among 
chosen combinations the most fertile 
will often be those formed of elements 
drawn from which are far 
apart.” An exceptionally compelling 
illustration of a useful combination of 
elements “drawn from domains which 
are far apart” occurs in a line from the 
poem, “The Monkey Puzzle” by Mari- 
anne Moore (1951), “The lion’s fero- 
cious chrysanthemum head.” 

We will state our basic hypothesis 
regarding the nature of creative think- 
ing in the form of a definition. With 
these introspective statements serving 
as background, we may proceed to de- 
fine the creative thinking process as the 
forming of associative elements into 
which either meet 
specified requirements or are in some 
way useful. The more mutually re- 
mote the elements of the new combina- 
tion, the more creative the process or 
solution. An additional criterion of 
the level of creativeness of a product 
is described below. 

Creative thinking as defined here is 
distinguished from original thinking 
by the imposition of requirements on 
originality. Thus, 7,363,474 is quite 
an original answer to the problem 
“How much is 12+ 12?” However, 
it is only when conditions are such that 
this answer is useful that we can also 
call it creative. There are many orig- 
inal ideas expressed in institutions for 
the mentally ill and mentally retarded ; 
few of these are likely to be creative. 
There are many fields of creative en- 
deavor in which the usefulness of prod- 
ucts would be difficult to measure re- 
liably. While these difficulties must 
eventually be faced, for the present our 
research efforts have been concentrated 


domains 


new combinations 


on laboratory situations in which cri- 
teria for usefulness can be arbitrarily 
experimenter-defined and unequivo- 
cally explained to the subject. The 
originality of a response is simply in- 
versely related to its probability in a 
given population. 

It should be pointed out that this 
definition of creativity is quite similar 
to basic notions advanced by British 
associationists from Locke (1690) to 
Bain (1855), and by those psycholo- 
gists whose work is based in large 
measure on their speculations. Freud 
(1938), Hollingsworth (1928), and 
inet (1899) may serve as examples. 
ACHIEVING A CREATIVE 
SOLUTION 


Ways OF 


In terms of associative theory, we 
may point to three ways of achieving 
a creative solution. Generally, any 
condition or state of the organism 
which will tend to bring the requisite 
associative elements into  ideational 
contiguity will increase the probability 
and speed of a creative solution. 
Therefore, the following three ways of 
attaining creative solutions are all 
methods of bringing the requisite asso- 
ciative elements together. 

Serendipity. The requisite associa- 
tive elements may be evoked contigu- 
ously by the contiguous environmental 
appearance (usually an accidental con- 
tiguity) of stimuli which elicit these 
associative elements. This sort of cre- 
ative solution is often dubbed seren- 
dipitous. This is the manner of dis- 
covery to which is popularly attributed 
such inventions as the X ray and such 
discoveries as penicillin. One physi- 
cist has described how he has reduced 
serendipity to a method by placing in 
a fishbowl large numbers of slips of 
paper, each inscribed with a physical 
fact. He regularly devotes some time 
to randomly drawing pairs of these 
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facts from the fishbowl, looking for 
new and useful combinations. His 
procedure represents the operational 
embodiment of this method of achiev- 
ing creative solutions. 

Similarity. The requisite associative 
elements may be evoked in contiguity 
as a result of the similarity of the 
associative elements or the similarity 
of the stimuli eliciting these associative 
elements. This mode of creative solu- 
tion may be encountered in creative 
writing which exploits homonymity, 
rhyme, and similarities in the structure 
and rhythm of words or similarities in 
the objects which they designate. The 
contiguous ideational occurrence of 
such items as alliterative and rhyming 
associates may be dependent on a fac- 
tor such as primary stimulus general- 
ization. It seems possible that this 
means of bringing about contiguity of 
associational elements may be of con- 
siderable importance in those domains 
of creative effort which are less di- 
rectly dependent on the manipulation 
of symbols. Here we might include 
certain approaches to painting, sculp- 
ture, musical composition, and poetry. 

Mediation. The requisite associa- 
tive elements may be evoked in con- 
tiguity through the mediation of com- 
mon elements. This means of bringing 
the associative elements into contigu- 
ity with each other is of great impor- 
tance in those areas of endeavor where 
the use of symbols (verbal, mathemati- 
cal, chemical, etc. .) is mandatory. 
For example, in psychology, the idea 
of relating reactive inhibition and cor- 
tical satiation may have been mediated 
by the common associates “tiredness” 
or “fatigue” (Kohler & Fishback, 
1950). 


INDIVIDUAL DIFFERENCES 


From the definition given above, the 
factors that will make for individual 
differences in the probability of achiev- 


ing creative solutions may be deduced. 
Any ability or tendency which serves 
to bring otherwise mutually remote 
ideas into contiguity will facilitate a 
creative solution; any ability or ten- 
dency which serves to keep remote 
ideas from contiguous evocation will 
inhibit the creative solution. 

Listed below are several illustrative 
predictions concerning individual dif- 
ferences that may make 
this theoretical orientation. 


one trom 


Need for Associative Elements 

It should be clear that an individual 
without the requisite elements in his 
response repertoire will not be able 
to combine them so as to arrive at a 
creative solution. An architect who 
does not know of the existence of a 
new material can hardly be expected 
to use it creatively. 
Associative Hierarchy 

The organization of an individual's 
associations will influence the proba- 
bility and speed of attainment of a 
creative solution. There is a whole 
family of predictions that one may 
draw from this concept of the associa- 
tive hierarchy. As an initial example, 
let us take the question of the manner 
in which the associative strength 
around ideas is distributed. If we 
present an individual with the word 
“table,” what sort of associative re- 
sponses does he make? The individ- 
ual who tends to be restricted to the 
stereotyped responses, such as “chair,” 
may be characterized as having an as- 
sociative hierarchy with a steep slope 
(see Figure 1). That is, when you get 
past the first one or two conventional 
responses to the stimulus, the individ- 
ual’s associative strengths to other 
words or ideas (lower in the hier- 
archy) drops rapidly. We can also 
conceive of a second sort of individual 
whose associative hierarchy is charac- 
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terized by a rather flat slope. This is 
an individual who perhaps also has as 
his strongest response the conventional 
chair. But for him this response is not 
overly dominant and so it is more 
likely that he will be able to get to the 
less probable, more remote kinds of 
associations to table. It is among these 
more remote responses that the requi- 
site elements and mediating terms for 
a creative solution will be lurking. 
This slope factor may be related to 
the mathematical analysis of associa- 
tive production developed by Bous- 
field, Sedgewick, and Cohen (1954). 
It probably is closely approximated 
by their constant, m, measuring rate of 
depletion of the associative reservoir. 
They found a high negative correlation 
between rate of association and total 
number of associations. It would be 


predicted from Figure 1 that the high 
creative subject (flat hierarchy) would 
respond relatively slowly and steadily 


and emit many while the 
low creative subject (steep hierarchy ) 
would respond at a higher rate but 
emit fewer responses. 

It would be predicted that the 
greater the concentration of associative 
strength in a small number of stereo- 
typed associative (steep 
hierarchy) the less probable it is that 
the individual will attain the creative 
solution. Thus, the word association 
behavior of the high creative individual 
should be characterized by less stereo- 
typy and commonality. This last pre- 
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Associative hierarchies around the 
word “table.” 
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diction is supported by a study by 
Mednick, Gough, and Woodworth 
(Mednick, 1958). Research scientists 
rated for creativity were divided into 
relatively high (N=15) and rela- 
tively low (N = 15) groups. The low 
creatives gave more stereotyped re- 
sponses on 80% of a group of 36 test 
words from the Kent-Rosanoff list. 
(These test words were chosen for 
their tendency to elicit stereotyped re- 
Stereotypy was defined by 
the Minnesota Kent-Rosanoff Word 
Association Norms, Russell & Jen- 
kins, 1954). It should be pointed out 
that these results lend themselves to 
another possible interpretation. The 
highly creative individual may also 
have a steep hierarchy but a deviant 
one. That is, his most dominant asso- 
ciative response may be quite strong 
but quite different from the popular, 
dominant associative response. There 
are different predictions that can be 
made for the flat-associative-hierarchy 
creative and the steep-deviant associa- 
tive-hierarchy creative. The latter is 
more likely to be the one-shot pro- 
ducer (a not uncommon phenomenon 
among novelists). If he does create 
further products, they will tend to re- 
semble closely the first product. The 
former is more likely to be a multi- 
producer ; he is more likely to produce 
in a variety of avenues of creative ex- 
pression. 


sponses. 


The prediction suggesting an expec- 
tation of less creativity from an in- 
dividual with a high concentration of 
associative strength in a few responses 
leads to another prediction. The 
greater the number of instances in 
which an individual has solved prob- 
lems with given materials in a certain 
manner, the less is the likelihood of 
his attaining a creative solution using 
these materials. Such an individual 
will “know the meaning” of the ele- 
ments of the subject matter. That is, he 
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will have a steep associative hierarchy 
around these elements. An example 
of the operatior. of this principle re- 
cently occurred to the writer while 
teaching an honors freshman introduc- 
tory course in psychology at the Uni- 
versity of Michigan. I was giving a 
well known interpretation of a well 
known experiment in stimulus general- 
ization when interrupted by a student 
who calmly stated that the interpreta- 
tion was in error. After a few minutes 
of blustering I asked him to explain. 
His explanation proved him to be cor- 
rect. I had been dealing with this 
material for years and “knew” the 
“correct” interpretation; for him this 
material was new, he had a low, flat 
associative hierarchy. Thus, if a new- 
comer to a field has the requisite in- 
formation, he is more likely to achieve 
a creative solution than a long-time 
worker in the field. This may be the 
reason that theoretical physicists and 
master chess players are often said to 
have passed their prime by the age of 
25. 


Number of Associations 


The greater the number of associa- 
tions that an individual has to the re- 
quisite elements of a problem, the 
greater the probability of his reaching 
a creative solution. This variable is 
not independent of the preceding one 
since an individual with a high con- 
centration of associative strength in 
few associative responses is not likely 
to have a proliferation of associations. 
The more associates which are evoked 
by a requisite element of a problem, 
the more likely it is that an associate 
will exist which will serve as a medi- 
ating bridge to another requisite ele- 
ment, facilitating combination. It 
seems likely that this variable will not 
be related to speed of creative solution 
since it may take a good deal of time 
to get to the mediating links. 
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Cognitive or Personality Styles 


Previously learned or innately pre- 
disposed methods of approaching prob- 
lems will influence the probability of 
a creative solution. If the requisite 
associational elements of a new and 
useful combination are probable asso- 
ciates of the concrete representations 
of relevant aspects of the problem, an 
individual with a predominately “per- 
ceptual” approach will be more likely 
to reach a creative solution. If, how- 
ever, the requisite associational ele- 
ments are not elicited as responses to 
these concrete representations or if 
there is no concrete representation then 
an individual with a “conceptual” ap- 
proach will be more likely to reach a 
creative solution. 

Another cognitive style of impor- 
tance may lie along the “visualizer- 
verbalizer” dimension. The visualizer 
is one who tends to call up relatively 
complete memorial sensory representa- 
tions of the relevant concrete aspects 
of problems. If the problem deals with 
horses, he tends to picture a horse in 
terms of its sensory qualities. On the 
other hand, the verbalizer explores the 
problem by associating with words 
around the word “horse.” If the req- 
uisite elements are high in his verbal 
associative hierarchy to the word 
horse, the verbalizer will be more likely 
to attain a creative solution ; the visual- 
izer may be thrown off or at least de- 
layed by many false leads. On the 
other hand, if a requisite verbal asso- 
ciative response to the word horse is 
very low, or not present in the ver- 
balizer’s hierarchy, then the visualizer 
will be more likely to attain the crea- 
tive solution. It is therefore clear that 
some types of problems will be solved 
more easily by the visualizer and some 
by the verbalizer. 

Factors such as these (admittedly 
very poorly defined) may be partly re- 
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sponsible for differential aptitudes for 
creative work in differing fields. 


Selection of the Creative Combination 


The creative combination of elements 
is only one among the many which may 
present themselves to the subject. How 
or why is the creative combination se- 
lected? Some speculations regarding 
this problem follow. The explanation 
of the process of selection may be re- 
lated to the nature of the problem. 
Problems either entail a specific and 
relatively objective set of testable cri- 
teria (Paint a realistic portrait of this 
individual. Design a refrigerator so 
that it will be automatically free of 
frost.) or they do not (The chemist 
mixes two liquids out of curiosity. The 
painter dabs hopefully at a fresh canvas 
waiting for an idea. The psychologist 
tosses a new test into a correlation 
matrix). When specific criteria are 
provided, they form an important part 
of the stimulus set which is determining 
which associative elements are being 
elicited and thus becoming eligible for 
entering into combination with other 
elements. Important sets of associa- 
tions to each of these combinations are 
the consequents of the combinations. 
The set of consequents for each com- 
bination (If I put x, y, and z together, 
a and b will happen) is continually 
compared with the set of requirements 
of the problem. When the set of conse- 
quents of a new combination achieves 
a close fit with the set of problem re- 
quirements, this combination is selected. 
When there is complete overlap of sets, 
“search behavior” is terminated. As 
with the other requisite elements of the 
problem, individual differences in this 
will vary with (among other 
things) the structure of the associa- 
tional hierarchies around the require- 
ments of the problem. When the 


case 


refrigerator-defroster problem was pre- 
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sented to an undergraduate class almost 
all of the proferred solutions were based 
on the principle of ridding the refriger- 
ator of already heavily accumulated 
frost. A couple of individuals (possi- 
bly familiar with the defroster principles 
presently in use) suggested methods 
which disposed of the frost before it 
built up to an overly annoying level. 
In addition to these there were two 
unique responses, i.e., a “new’’ method 
of preventing moisture from condens- 
ing in the freezer compartment, and a 
method of allowing frost to accumulate 
but limiting the location of accumula- 
tion to a small box which could be regu- 
larly and conveniently removed and 
Thus it may be seen that an 
individual’s associations to the require- 
ments may be characterized as to their 
stereotypy; the imposed requirements 
of the problem may be viewed as part 
of the requisite elements in the situa- 
tion. The earlier theoretical statements 
concerning these elements may be seen 
as being relevant here. The foregoing 
suggests an explanation of the selection 
process for the case where the subject 
must hunt for a combination of ele- 
ments which will satisfy given criteria. 
In the case where no criteria are speci- 
fied, the subject is typically producing 
random combinations of elements; the 
task of selection in this case consists in 
finding relevant criteria for the given 
partial products. 

If we may continue along a bit fur- 
ther with this example of the defroster, 
we may begin to see some glimmerings 
of a solution to the most serious prob- 
lem in research on creative thinking— 
how may we determine to what degree 
behavior is creative? We have sug- 
gested one criterion in our hypothesis. 
In the following an additional criterion 
is developed. To begin with let us ex- 
amine the requirements as originally 
stated—“Design a refrigerator so that 
it is automatically free of frost.” The 


emptied. 
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first thing that strikes us is that while 
some requirements have been stated, 
there are even more that are strongly 
implied and essential, many that are 
desirable, and a number that we would 
only become aware of after 
method of satisfying them had been 
suggested. 

Let us examine some possible so- 
lutions : 

1. Simply refraining from opening 
the refrigerator door would solve the 
problem as stated since this would pre- 
vent moisture from entering and con- 
densing as frost. This solution meets 
many of the implied requirements. It 
is cheap, convenient, effective, does 
not require special training, etc. 
However, it is not an optimal solution 
since it violates one essential, implied 
requirement—the usefulness of the 
refrigerator must not be impaired. 
(This is the cutting-off-your-nose-to- 
spite-your-face solution.) 

2. A primitive solution the 
hammer-and-screwdriver method. This 
is tried and true and meets many of the 
essential requirements. It falls down 
in that it is inconvenient, messy, un- 
economical (when caked with frost, the 
refrigerator unit is very inefficient), en- 
dangers the mechanism, and is hardly 
automatic. 

3. In a refrigerator we once owned 
another solution was used. The open- 
ing and closing of the refrigerator door 
operated a counter. Ata certain count 
the refrigerator unit was automatically 
heated and the melted water evaporated 
outside the refrigerator. The superi- 
ority of this solution is immediately ap- 
parent. The source of this superior- 
ity lies in the number of requirements 
which it meets. It is economical, auto- 
matic, convenient, peculiarly appropri- 
ate (the operation of the heating 
element is contingent upon the number 
of door openings. The amount of frost 
accumulated is also in part dependent 


some 


is 
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on the number of door openings. ), does 
not interfere with the normal use of the 
refrigerator, and does not require spe- 
cial training. Note that the principle 
behind this highly creative solution 
(not allowing massive build-ups of 
frost) was infrequently suggested in 
the classroom group. However, this 
solution is not wholly successful at 
meeting some criteria. The frequent 
heating and cooling may injure frozen 
food stored near the heating element. 
Secondly, since the heating process 
must be brief and mild, it is inevitable 
that not all frost is removed. While 
this solution does effectively curtail the 
number of defrostings, it does not elim- 
inate them completely. It is clear that 
a method which would encompass all of 
the advantages of the “counter” method, 
but which would, in addition, eliminate 
defrosting altogether would be even 
more creative. What is suggested by 
this discussion is that the creativeness 
of a product is some function of the 
number of requirements that the prod- 
uct meets. The most ready application 
of this definition will be in laboratory 
research in which tasks, solutions, and 
requirements may be arbitrarily con- 
structed and varied. 


A Test OF CREATIVITY 


The definition of the creative process 
has suggested a way of testing for indi- 


vidual differences in creativity. The 
test items are intended to require the 
testee to perform creatively. That is, 
he is asked to form associative elements 
into new combinations by providing 
mediating connective links. Since the 
test situation is contrived, the combina-. 
tion must meet specified criteria that 
are experimenter imposed. 

The definition dictates the structure 
of the test. 
items from two mutually distant reali- 
ties and ask the subject to “draw a 


We must provide stimulus 
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spark from their juxtaposition.” To 
state it more usefully, we must provide 
stimulus elements from mutually re- 
mote associative clusters and have the 
subject find a criteria-meeting medi- 
ating link which combines them. A 
first problem concerns the type of ma- 
terial of which the stimulus item should 
be composed. If the test is to be ap- 
propriate for all fields of creative en- 
deavor, the material must either be 


nonsensical so as to avoid bias favoring 
any specific means of creative expres- 


sion, or it must be so common in so- 
ciety that familiarity could be assumed 
to be high across fields of interest. The 
problems involved in constructing the 
nonsense materials so as to avoid favor- 
ing any interest groups soon proved to 
be apparently insurmountable. This 
left us searching for materials with 
which most individuals in the culture 
could claim acquaintance ; this, in turn, 
brought us to verbal materials. 

While it may be true that certain 
occupational groups have extensive ex- 
perience in dealing with words, there 
are some verbal associative habits that 
could reasonably be assumed to be fa- 
miliar to almost all individuals that 
have been brought up in this (USA) 
culture. Among such habits are the 
associative bonds between words like 
“ham and eggs,” “bed-bug,” “pool-hall,” 
“hound-dog,” “whole-wheat,” “chorus- 
girl,” “kill-joy,” and “red-hot.” These 
became the materials for the test. 

Having decided on the materials, the 
test almost constructed itself in accord- 
ance with the definition. Several words 
from mutually distant associative clus- 
ters must be presented to the subject ; 
his task must be to provide mediating 
links between them. Further, (a factor 
of extreme importance), the mediating 
link must be strictly associative rather 
than being of a sort that follows elabo- 
rate rules of logic, concept formation, 
or problem solving. In their final (or 


at least present) form, the test items 
consist of sets of three words drawn 
from mutally remote associative clus- 
ter. One example might be: 


Example 1: rat blue cottage 


The subject is required to find a fourth 
word which could serve as a specific 
kind of associative connective link be- 
tween these disparate words. The an- 
swer to Example 1 is “cheese.” 
“Cheese” is a word which is present in 
the word pairs “rat-cheese,” “blue- 
cheese,” and “cottage-cheese.” The 
subject is presented with several ex- 
amples so that he has an adequate op- 
portunity to achieve the specific set 
necessary for the task. 

class 
birthday 
high 

cat 2 


girl 
line 
electric 
dog 


Example railroad 


Example surprise 
Example wheel 
Example 5: out 
(None of these examples is a test item 
from any form of the actual test.) The 
two college level forms of the test (one 
coauthored by Sharon Halpern and the 
other by Martha T. Mednick) have 30 
items each; the subject is allowed 40 
minutes; his score is the number right. 
The test, called the Remote Associ- 
ates Test (RAT), has some interesting 
correlations with other measures. 
Comparisons with Criteria. A study 
was conducted at the College of Archi- 
tecture, University of California, Berke- 
ley, by the writer and Sharon Halpern. 
Ratings of creativity by faculty mem- 
bers of the College who taught the De- 
sign courses were correlated with RAT 
These ratings form an unusu- 
ally excellent criterion of creative per- 
formance since the raters had been 
advising and evaluating the students in 
the creation of new designs and models 
of structures. They had been working 
with these students for at least a year 


sample RAT items: 2. 
party; 4. chair or wire; 5. 


scores. 


2 Answers to 
working; 3. 
house. 
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The 
ratings and RAT scores correlated sig- 
nificantly (r = .70, df= 19, p< .01) 
In this study an early form of the RAT 
was used. 

The RAT administered to a 
group of first year psychology graduate 
students at the University of Michigan 
whose native language was American 
English (N = 35). Faculty research 
supervisors (who had been directing 
the independent research efforts of the 
students), rated the eight highest and 
eight lowest RAT scorers either “high” 
or “low” in research creativity (no 
middle category allowed). Research 
creativity was defined as being demon- 
strated if the student developed new 
research methods and/or pulled to- 
gether disparate theory or research 
areas in useful and original ways. Of 
the 16 research supervisors, one felt 
that he had not had enough contact 
with his student to make the judgment. 
His student was a low RAT scorer. 
Of the eight high RAT scorers, six 
were rated high on research creativity 
and two were rated low; of the seven 
low RAT scorers, only one was rated 
high, the other six being rated low. 
By Fisher’s exact test the probability 
of these events occurring by chance is 
less than .05. Miller Analogies Test 
(MAT) scores were available for these 
students. Of the seven high MAT 
scorers, three were rated high on re- 
search creativity; of the eight low 
MAT scorers, four were rated high in 
research creativity. 

Reliability. The Spearman-Brown 
reliability of the RAT was .92 in one 
sample (289 women, almost all the 
students at an Eastern women’s college, 
tested as part of a project under the di- 
rection of Theodore Newcomb) and 
.91 in another (215 men tested at the 
University of Michigan as part of a 
project under the direction of Warren 
T, Norman). 


and in many cases two or more. 


was 
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Correlation with Grades. One of the 
present forms of the RAT correlated 
negatively with the first-two-year grade 
point averages of a group of under- 
graduates at a large Eastern technology 
college. (r = —.27, N = 74, p < .05). 
This same correlation was obtained 
with the summer grades of a smaller 
group, NV = 34, of summer students at 
a large Eastern liberal arts college (not 
statistically significant in this case). 
In a study by Miller (1960) it was 
found that high RAT scorers tended to 
get higher grades from teachers rated 
as flexible than from teachers rated as 
dogmatic. Low RAT scorers received 
higher grades from teachers rated dog- 
matic than from teachers rated as 
flexible. 

Correlation with Social 
and Occupational Interests. 
that creative individuals 


Attitudes 
It is clear 
must have 


access to improbable associative re- 


sponses. Kowalski (1960) hypothe- 
sized that this is a general tendency 
which also manifests itself in their atti- 
tudes and interests. She tested and 
interviewed 15 high RAT scoring and 
15 low RAT scoring undergraduate 
women. The two groups had radically 
differing views on sexual morality and 
The views of the high 
creatives were more atypical and 
“liberal” (U = 37, p< .001). On 
the Strong Vocational Interest Blank, 
Mens’ Form (SVIB), the high creative 
group showed “significantly higher in- 
terest on the artist (p < .05), psycholo- 
gist (p < .005), physician (p < .025), 
mathematician (p < .025), and author- 
journalist (p< .05) keys. The low 
creative group showed higher interest 
on the farmer (p < .05), math-physical 
science high school teacher (p < .05), 
office man (p < .05), and pharmacist 
(p < .01) keys” (p. 19). (These are 
the probability values of obtained chi 
squares.) The only one of these keys 
related to ACE scores was that of 


women’s rights. 
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“The commonality of these 
evaluated by 


physician. 
interest patterns 
noting the per cent overlap of the 
specific key with the general popula 
For ex- 


was 


tion of interest expression. 
ample, the Farmer key overlaps 45% 
with the general population, while the 
\rtist key overlaps 20%” (p. 20). 
The significant keys of the higher 
creatives had significantly less com- 
monality than the significant keys 
of the low creatives. differ- 
ences were independent of the influence 
of intelligence as measured by the ACE. 

Associative Behavior. In the discus 
sion of illustrative predictions it was 
suggested that highly creative individu- 
als would be characterized by a flat as- 
sociative hierarchy rather than a steep 
associative hierarchy. Further, it was 
proposed that the greater the number 
of associations that an individual has to 


These 


the requisite elements of a problem, the 
greater the probability of his reaching 
a creative solution. From these two 
independent statements it may be de- 
duced that when required to display his 
reservoir of associations to single stim- 
ulus words, the highly creative indi- 
vidual will have greater access to less 
probable associates and therefore pro 
duce a greater number of associates. A 
study by Craig and Manis (1960 un- 
published *) this deduction. 
Thirty-eight college students had the 
RAT and an associative task admin- 


supports 


istered to them. In the associative task 
they were given 1 minute to write as 
many associates as they could to each 
of 20 words. The correlation of the 
number of such associates with RAT 
scores was .38 (p < .01). 

In two related studies, Karp (1960) 
and Kowalski (1960) found RAT 


scores to be directly related to the or- 


8 Craig, M., & Manis, M. Prediction of 
scores on the Remote Associates Test by 
size of response repertoire. Unpublished 
manuscript, 1960. 
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iginality and quantity of anagrams con- 
structed using the test word “Genera- 
tion.” In the Karp study 40 under- 
graduates were given 5 minutes to pro- 
duce as many four letter anagrams from 
the test word as they could. The pro- 
ductions were scored for quantity 
(number of acceptable answers) and 
originality (a weighted score for each 
response was developed from the fre- 
quency with which the response was 
given by the 40 subjects). The correla- 
tion of the RAT with the quantity 
scores was .44 (p < .01); the correla- 
tion of the RAT with the originality 
score was .37 (p< .05). Kowalski 
presented the anagrams task to 15 high 
RAT and to 15 low RAT 
scorers, giving them 5 minutes to pro- 
duce words of any length from the test 
“Generation.” In this study orig- 


scorers 


word 


inality was measured by computing the 
percentage of responses given by an in- 


dividual which had not been given by 
any other of the 30 subjects. The dif- 
ference on this measure between high 
and low RAT scorers was significant 
(U = 68, p < .04). “Only four sub- 
jects in the low creative group gave 
any original responses at all while 
eleven subjects in the high creative 
group did” (p. 19). 

At the Institute of Personality As- 
sessment and Research, University of 
California, Berkeley, the RAT was in- 
cluded as part of the assessment battery 
administered to a group of 40 highly 
eminent architects. The RAT corre- 
lated .31 with the Originality (O-1) 
Scale of the IPAR Questionnaire Scale 
—.31 with the total Conformity 
obtained in the Crutchfield 
Conformity Experiment (Crutchfield, 
1955). Interviewers rated high scorers 
as significantly higher in “graceful and 
well-coordinated in movement” and 
“reticent and taciturn in speech.” The 
college grade point average which the 
subjects reported correlated —.34 with 


and 
Score 
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RAT result which tends to 


confirm findings reported above. 


score, a 


EXPERIMENTALLY MANIPULABLI 
V ARIABLES 


While only one experimental study 
(described below) which makes use of 
this general framework has been com- 
pleted in this laboratory, it may be use- 
ful to indicate briefly the kinds of ex- 
perimental investigation it suggests. 

Massed vs. Distributed Work Ses- 
Total time of work being equal, 
massed sessions of creative work should 
be more successful than distributed ses- 


sions, 


There are two reasons why this 
The first is that the indi- 
vidual making use of the massed session 


sions. 


wit yuld be So. 


technique is more likely to achieve tem- 
poral contiguity of the requisite associ- 
ative elements within a single intensive 
work period than is an individual who 
has distributed his work in shorter 
periods over several days. Secondly, it 
may take some time for an individual 
to work on a problem enough to go 
beyond its obvious aspects. In the first 
hour of work, he may get through only 
1 


the conventional and stereotyped asso 


ciations to the elements of the problem, 
while it is perhaps in the later stages of 


intensive work on a problem that one 
can begin to entertain the more remote 
that 
ments of the problem. 


associations are evoked by ele- 
It is, of course, 
among these remote associations that 
the key to the creative solution will lie. 

Warmup. Increative work a warmup 
session should serve to arouse the more 
remote associations to the requisite ele- 
While their 
work has gone considerably beyond the 
problem of warmup, Maltzman, Bo- 
gartz, and Breger (1958) have demon- 
strated that the repeated elicitation of 
different word associations to the same 
words indeed tend to 
produce remote associations to these 


stimulus words. Further, this induced 


ments of the problem. 


stimulus does 
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originality tends to transfer to other 
relatively unrelated presented 
after this associative warmup. Associ- 
ative warmup of this type should be- 
more as the warmup 
stimuli are more similar to the task ma- 
terials. It may be that the effects of 
warmup will prove to be a further ad- 
vantage that massed sessions have over 
distributed pro- 
ductivity. 


tasks 


come effective 


sessions for creative 


Stereotyping Associative Responses. 
As stated above, if an individual’s as- 
sociative response to a stimulus element 
of a creative problem is of excessive 
strength, this will tend to reduce the 
likelihood of occurrence of more remote 
associative responses. This will reduce 
the probability and speed of creative so- 
lution. It would therefore be predicted 
that extensive training of a specific re- 
sponse other than a requisite one to a 
stimulus element of a problem requiring 
a creative solution should retard later 
attempts at solution of the problem. 
This prediction is related to the con- 
cept of “functional fixedness” intro- 
duced by Duncker (1945). Birch and 
Rabinowitz (1951) and Adamson and 
Taylor (1954) completed experiments 
which are related to this prediction. 
Their test situation was the two string 
problem. 
tie together two strings suspended from 
the ceiling. When the subject grasps 
one string he finds that the other string 
is hanging out of his reach. The solu- 
tion to the problem requires the sub- 
ject to attach a weight to one of the 
strings, get the weight swinging and 
catch it while holding the other string. 
Various objects can be used as a 
weight. The subjects that had been 
pretrained by having them use a switch 
in its usual manner of 
tended to be unlikely to use it as a 
weight. They had developed strong re- 
sponse strength for the 
“switch-close circuit” which 


The subjects are asked to 


functioning 


association 
had re- 





Tue AssociaATIVE BASIS OF 


duced the probability of the remote as- 
sociation “switch-weight.” 

Another feasible experimental ap- 
proach would make use of the RAT 
item as a creative task and test the in- 


fluence upon it of certain variables. 


For example, the words of which an 
item is composed may be presented at 


varying rates to test the massed trials 
hypothesis. In addition, various pre- 
training conditions may be evaluated in 
terms of their effectiveness in increasing 
or decreasing RAT performance. 
Another possible experimental ap- 
proach would entail separating out high 
and low RAT scorers and observing 
the differential effect of certain vari- 
ables upon their behavior. In an ex- 
periment just completed Houston and 
Mednick (in press) postulated that an 
important motive impelling the be- 
havior of the creative individual was a 
need for improbable associative stimu- 
lation. It such 
stimulation were supplied, it 
tend to satisfy this need and be rein- 
forcing. Further, if such stimulation 
regularly followed a given response the 
high creative individual should tend to 
Thirty high and 
asked to 


was reasoned that if 
would 


learn that response. 
30 low RAT 
read aloud only one of two typed 
words on a 3X5 card. Excepting 
buffer items and including 40 pairs 
aimed at the free operant 
level of noun-choice, there were 160 
pairs of words, each pair consisting of a 
noun and a nonnoun (verb, adjective, 
adverb, etc.). If a subject in the experi- 
mental group (15 high RAT subjects, 
15 low RAT subjects) responded with 
the noun member of a pair, the experi- 
menter responded with an improbable 
association; if the subject chose the 
nonnoun, the experimenter responded 


scorers were 


gauging 


with the most probable association. In 
the control group (15 high RAT, 15 
low RAT) both the nouns and the 
nonnouns were invariably followed by 
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their most probable associate. Asso- 
ciative probabilities were obtained from 
the Russell and Jenkins (1954) and 
Deese (1960) norms. If the improba- 
ble response was satisfying a need, 
the probability of noun-choice should 
increase over the 160 trials. It did 
significantly in the high RAT experi- 
mental group; the low RAT experi- 
mental group showed a decrease. The 
and low RAT control groups 
showed no reliable change. 


high 


Some of the positions which have 
been taken in this paper are assump- 
tions and not deductions. As more 
data are gathered some of these as- 
sumptions will assume the status of 
facts, some will be revised. For ex- 
ample, the opening paragraph suggests 
that the paper is not meant to apply 
only to one field of creative endeavor 
but attempts to delineate processes 
that underlie all creative thought. This 
may require modification. The expla- 
nation may fit the process of scientific 
discovery and not be appropriate to 
discussions of painting or music. For 
the present (paradoxically enough ), the 
more encompassing assumptions seem 
more parsimonious. It may eventually 
turn out (as is hinted at in the body of 
the paper) that the differences between 
the fields are more determined by dif- 
ferences in suitability of the three 
means of achieving contiguity, i.e., 
serendipity, similarity, and mediation. 


SuM MARY 


An associative theory of creative 
thinking has been outlined. Differences 
between high creatives and low crea- 
tives have been predicted along speci- 
Predictions have been 
made regarding the effect on the crea- 


fied dimensions. 


tive process of some experimentally 
manipulable variables. 

The associative definition of the crea- 
tive process has taken the operational 
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form of a test. Some preliminary re- 
search with this test is described. 
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REANALYSIS OF “MEANINGFULNESS AND VERBAL 
LEARNING” 


RONALD C. JOHNSON 


University of Hawaii 


Letter combinations such as nonsense 
syllables and trigrams (three letter 
combinations that may or may not form 
words or syllables) have a number of 
characteristics found to influence rate of 
learning. Those that 
have been investigated most thoroughly 
may all be referred to as frequency vari- 
ables. These variables are: 

1. Frequency, per se: A number of 


characteristics 


possible measures of frequency exist. 
Some trigrams form words. For these 
trigrams, Thorndike-Lorge (1944) fre- 
quencies can be obtained. Some tri- 
grams and nonsense syllables form 
sounds or syllables that occur in the 
English language, and it is possible to 
obtain some measure of the frequency 
The HAv-LAV dif- 
ferentiation of nonsense syllables ac- 
cording to whether or not they occur as 
the first three letters of English words 


of this occurrence. 


is essentially a frequency measure. It 
has proven useful in learning experi- 
ments in which nonsense syllables are 
used as stimuli (e.g., Lindley, 1960) 
and might also serve as a trigram fre- 
quency measure. Finally, the frequency 
of a trigram or nonsense syllable might 
be defined as the frequency that the 
three letters of any given stimulus oc- 
cur in contiguity in English words. 

2. Association value (a): The a of 
a stimulus is generally established by 
determining the proportion of subjects 
who have an association (even if they 
cannot tell what this association is) to 
a stimulus within a rather short (2—4- 
second) time interval. Association 
value does not directly measure the 
number of associations evoked by a 
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stimulus, although syllables that evoke 
an association in a high proportion of 
subjects generally evoke more associa- 
tions, if presented for a longer time 
interval, than do syllables for which few 
subjects have any associations. 

3. Meaningfulness (m): Noble 
(1952) has proposed as a measure of 
m the mean number of associations 
(excluding clang and certain other cate- 
gories of association) that a verbal 
stimulus can evoke from subjects in a 
fixed time interval. While a is estab- 
lished by determining whether a stimu- 
lus can evoke an association, m is 
arrived at from the mean number of 
these associations. 

All of these frequency variables in- 
fluence rate of verbal learning. All of 
these variables are substantially related 
to one another. The frequency of oc- 
currence of nonsense syllables as three 
letter sequences in words is related to 
the a of these syllables (Underwood, 
1959), and the Thorndike-Lorge fre- 
quency of words is closely related to m 
(Johnson, Frincke, & Martin, unpub- 
lished +; Underwood, 1959). Associa- 
tion value (a) and m are also related 
(Noble, Stockwell, & Pryor, 1957). 
While these three measures are related, 
it seems dangerous to assume that be- 
cause of this relation they are synony- 
mous. Aside from nonequivalent oper- 
ations used in obtaining these measures, 
we have ample precedent in studies of 
the relation of m and _ familiarity 


1 Johnson, R. C., Frincke, G., & Martin, 
Lea. Meaningfulness, frequency, and affec- 
tive character of words as related to response 


measures. Unpublished manuscript, 1960. 
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(Noble, 1953) and of m and a (Noble, 
Stockwell, & Pryor, 1957) for sus- 
pecting it to be in error to lump these 
measures together as equivalent or 
synonymous. 

The most ambitious recent study of 
verbal learning is probably that of Un- 
derwood and Schulz (1960) as de- 
scribed in their book, Mcaningfulness 
and Verbal Learning. A number of the 
experiments reported by Underwood 
and Schulz involve the learning of lists 
of trigrams by various groups of sub 
jects. For Underwood and Schulz, a 
trigram consists of any three successive 
letters in a 
word “learning” contains the following 
trigrams : LEA, EAR, ARN, RNI, NIN, ING. 
Trigrams were obtained from a large 
Trigram frequency 
measures were obtained, with frequency 


word. For example, the 


sample of words. 


being defined as the number of times that 
the three letter combinations 
occurred in contiguity in the sample of 
words from which they were obtained. 
The term meaning or M is used inter- 
changeably to denote meaningfulness 
(m) and association value (a). While 
frequency and M are discussed sep- 
arately by Underwood and Schulz, the 
assumption that they appear to make is 


various 


that the relation between these two (or 
three) variables is so close that separate 
analyses of the data regarding these 
Pronuncia- 
bility ratings were obtained from sam- 


variables are unnecessary. 


ples of subjects for those trigrams used 


in a number of learning experiments. 

The trigram experiments center 
around the problem of determining 
whether frequency-M or pronunciabil- 
ity can account for the majority of 
variance in rate of verbal learning. In 
the first of 
trigram learning experiments (Experi- 
ment 6, p. 144ff.), four lists of eight 
trigrams each were learned by various 
Taking the lists 


Underwood and Schulz’s 


samples of subjects. 
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as a whole, frequency was significantly 
related to rate of learning, but pro- 
nunciability showed an even closer re- 
lation. Three lists of 12 trigrams each 
served as stimuli in Experiment 11 
(p. 169ff.). No significant relation 
was obtained between trigram fre- 
quency and rate of learning. The re- 
lation between trigram pronunciability 
and rate of learning was highly signifi- 
Underwood and Schulz say, 
“Clearly, we have no alternative but to 
conclude that for these lists the relation 
between frequency and learning is es- 
sentially zero” (p. 171). Other experi- 
ments yielded similar results and Un- 
derwood and Schulz state that M is 
unrelated to rate of learning. They say: 


cant. 


Rather, we believe that the apparent casual 
status of M derives from a certain amount of 
covariation with pronunciability, and if this 
covariation is removed, only the relationship 
between pronunciability and learning will 
hold up (p. 192). 

(This removal of covariance does in- 
deed reduce the relation of frequency 
(and M) to approximately zero—e.g., 
194. The question, in partial 
correlation, is what variable to partial 
out.) They again refer to pronuncia- 
bility, saying : 


see p- 


It appears then, quite unexpectedly and quite 
without theoretical direction, that we 
stumbled upon an attribute of verbal units 
which has more predictive power by far than 
any other attribute we have discovered or 
methodically set about to measure. (p. 197) 


have 


Further investigations, using gener- 
ated trigrams, confirmed the belief that 
of the different variables studied, pro- 
nunciability was the most closely re- 
lated to rate of learning. Underwood 
and Schulz’s rejection of frequency, 
meaningfulness, and association value 
as major variables in verbal learning is 
indeed astounding since this rejection 
runs counter to the previous interpreta- 
tions of a most extensive body of data. 
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It is, therefore, something of a disap- 
pointment to find that Underwood and 
Schulz once more accept frequency as 
the key variable in verbal learning on 
p. 289ff. They say that if one defines 
trigram frequency as the frequency that 
the three-letter combination occurs as a 
sound in the English language, rather 
than using frequency of contiguity as a 
measure of frequency, then frequency 
may serve as an adequate prediction of 
rate of verbal learning. 

As pointed out in this paper, there 
are a number of possible measures of 
trigram frequency. 
to be too gestalt oriented to have con- 
sidered the possibility that the fre- 
quency with which trigrams appear as 
units of English speech is a better meas- 
ure of frequency, in terms of predictive 
power, than is frequency of contiguity 
of three-letter sequences. While Un- 
derwood and Schulz, in their analysis, 
define frequency in terms of contiguity, 
this writer will define frequency as the 
frequency of occurrence of trigrams as 
discrete or relatively discrete sound 


One does not have 


units in spoken English. The purpose 
of this paper is to reanalyze some of 
Underwood and Schulz’s data from this 
Experiments 6 and 11 
seem representative of their trigram 
learning experiments and shall form 
the basis of the major portion of this 


point of view. 


reanalysis. 
EXPERIMENT 1 


One syllable words form relatively 
discrete sound units. Sounded syllables 
in words form somewhat less discrete 
units, while three letter combinations 
that are neither words nor sounded syl- 
lables are not sound units of English 
speech. It is relatively simple to select 
the three letter words from trigram lists 
used in Experiments 6 and 11. The 
one exception to this simplicity in se- 
lection is the trigram “QuE,” used in 


Experiment 6, which does not appear in 
Webster’s New W orld Dictionary of the 
American Language, but does appear 
among the 30,000 most frequent words 
in the Thorndike-Lorge tables. The 
writer, decided, on the basis of its rela- 
tively high frequency, to count QUE as 
The next problem was to sep- 
The 
purpose of the first experiment re- 
ported in this paper was to separate 
those trigrams that occur as syllables 
(sound units that do not, in themselves, 
form words) from those that do not 
occur as sound units, and to determine 
whether there are differences in the 
rate that words, syllables, and nonsyl- 
lables are learned. 


a word. 
arate syllables from nonsyllables. 


Method 


Subjects. The subjects were 51 upper di- 
vision psychology students. 

Materials. The materials were those 52 
trigrams from Underwood and Schulz’s Ex- 
periments 6 and 11 that do not form English 
words. 

Procedure. The subjects were presented 
with the trigrams in a double column on a 
single dittoed page in the same order that 
these trigrams appeared in Underwood and 
Schulz’s book. They were told: 

Look at the first of these three letter com- 

binations. Say it to yourself. Does this 

combination occur as a sounded syllable in 

English words? If it does, put a check 

after it. Do this for all three letter com- 

binations on the page. 


Results and Discussion 


Whether due to perversity or to mis- 
understanding, such improbable com- 
cou were described as 
syllables by several subjects. For this 
reason it was decided to count three- 
letter combinations as syllables appear- 
ing in the English language only if 
rated as being syllables by at least five 
The words, syllables, and 
nonsyllables from Experiments 6 and 
11 are as follows: 


binations as 


subjects. 
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EXPERIMENT 6 


Words—was, BUT, MAN, 
STY, VIZ, QUE 
Syllables—zon, CYR, STI, JOK, 
VER, YAL, CHA, ING 
Nonsyllables—rcH, XPE, IFO, TLY, DGM, XPE, 
GHT, ULD, CKB, MPT, NDF, XPO, 
MBK, DFL 


WHO, ART, 


EIG, SOU, 


EXPERIMENT 11 


Words—Boy, URN, 
PAR, LED 
PLO, VIF, ING, EST, 
FRO, UND, VER, CHI, ENT, 
ROP, ISH 

NDE, OUS, WHA, ATI, 
YIN, ALI, OMP, MPA, 
NCE, TIO 


HER, HAT, CUT, ITS, 


Syllables WHE, 


sou, 
UM, 
Nonsyllables CQU, 


YLV, 


DGM, 
ABL, 


Kruskal and Wallis H show 
that the words, syllables, and nonsy] 
lables used in Experiment 6 differ sig- 
nificantly in pronunciability (7 = 21.7, 
df=2, p< .001) and in mean cor- 
rect anticipations (H = 15.3, df= 2, 
Pp < .001). The words, syllables, and 
nonsyllables in Experiment 11 differ 


(H 


in mean 


tests 


significantly in pronunciability 
24.2, df =2, p< 001) and 
correct anticipations (i = 532, df 
2, p < .005). 

If we divide the words, syllables, and 
nonsyllables of each list into these three 
categories and divide the trigrams in 
each category into those above and be- 
low the median in pronunciability for 
their group, we obtain the data pre- 
sented in Figure 1. 

It is apparent that Underwood and 
Schulz, in pronunciability 
ratings of these trigrams, ranged words, 


obtaining 


syllables, and nonsyllables along a con- 


tinuum, while the experimenter, using 
another technique, divided them into 
three discrete groups. The significance 
figures show that the two measures are 
substantially related. The method used 
in differentiating between these tri- 
grams would depend on the definition 
and the measure of trigram frequency. 

It seems clear, from this graphic rep- 
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PETS ot 

Words 
Mean 
ticipations for 
words above 

nunciability. 


correct an 
and 
pro 


percentage of 
nonsyllables, syllables, 
and below median in 


resentation as well as from the // tests, 
that trigrams that form 
learned more rapidly than trigrams that 
form syllables (a somewhat less dis- 
crete unit of speech )—contrary to Un- 
derwood and Schulz’s belief as stated 
in p. 284.—and that trigrams that form 
syllables are learned more rapidly than 
are trigrams judged by subjects as not 
forming syllables in the English lan- 
guage. When the data are analyzed in 
this way, pronunciability no longer ap- 
pears as a major variable. 

Whether frequency or pronunciabil- 
ity is the predictor of rate of learning is 
a result of the definition of frequency 
If frequency is defined as the 
number of contiguous occurrences of 
three-letter combinations, then pro- 
nunciability seems to be the significant 
If frequency is defined as 
number of occurrences of three 
different levels 
of discreteness in the English language, 


words are 


used. 


variable. 
the 
letter combinations at 


then frequency seems to be the most 
significant variable while pronuncia- 
bility merely serves as an indicator as 
to whether a trigram is a word, syllable, 
or nonsyllable. 


EXPERIMENT 2 


Underwood and Schulz, on p. 280, 
state that the frequency (of contiguity) 
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hypothesis broke down most completely 
in multiple letter units in the broad 
range between the extremes of well in- 
tegrated and poorly integrated units. 
Most of those trigrams rated by the 
subjects as appearing as syllables in the 
English language fall into this middle 
range of integration. For this reason, 
the experimenter investigated further 
the characteristics of those trigrams 
used in Experiments 6 and 11 that were 
rated as being syllables. The experi- 
menter, using the same subjects that 
had rated the trigrams of Experiments 
6 and 11 and using the same procedure, 
also determined which of the trigrams 
used in Expetiment 15 (p. 245ff.) 
formed syllables. Five or more sub- 
jects rated each of the following tri- 
grams used in Experiment 15 as occur- 
ring as syllables in the English lan- 
guage: ING, CED, JAD, ITE, NUW, 
MEL, DOK, NOP, SOM. 


SUL, 


Method 


Subjects. The subjects were 15 introduc- 
tory psychology students. 

Materials. The trigrams from Experi- 
ments 6, 11, and 15 that formed syllables but 
did not form words were placed on dittoed 
sheets, in random order, 10 trigrams to a 
page. 

Procedure. The sheets were passed out to 
the subjects, face down. The subjects were 
cautioned to leave the sheets face down till 
they were told to turn them over. The ex- 
perimenter then told them: 


What I want you to do is help me in a 
learning experiment. You have a list of 
three-letter combinations that form sylla- 
I am going to give you 20 seconds 
for each syllable. In that 20 
want you to say the syllable to yourself 
in as many ways as you wish. Then I 
want you to write down, following the 
syllable, all the words you can think of 
that contain the sound of the syllable. For 
example, if the first syllable was L-0-G, 
then you would say this three-letter com- 
bination to yourself and then write down 
the words that contain L-o-G, as it is 
sounded, for example, log, logic psy- 
chology, logey, and logger. [This example 


bles. 
seconds | 
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was put on the blackboard.] You are to 
do this for each syllable. When I say 
“start” turn your paper over and begin with 
the first syllable. When I say “stop” stop 
writing. When I say “start” again, begin 
with the second syllable. Go through the 
entire list in this way. Do not look ahead. 


The subjects signified that they understood 
the directions and were administered the list. 
They were timed by stop watch. Their re- 
sponses were scored. 


Result and Discussion 


The mean number of words pro- 
duced for each of the syllables is pre- 
sented in Table 1. 

The rank-order correlation between 
the number of words produced for each 
syllable and the mean number of cor- 
rect anticipations made by Underwood 
and Schulz’s subjects for the same syl- 
lable were +.48 for the syllables used 
in Experiment 6; +.64 for the syllables 
used in Experiment 11, and +.75 for 
the syllables used in Experiment 15. 
The last two of these correlations are 
significant beyond the .05 level of con- 
fidence. The responses to this produc- 


TABLE 1 


MEAN Worps PRODUCED FOR 
TRIGRAM SYLLABLES 


Experiment 6 Experiment 11 Experiment 15 
| 

Mean 
produced 
words 


Mean 
produced 
words 


sy. | Mean 
lable | Produced 
~ | words 


Syl- | 
lable 


y 
lable 


4.93 
2.00 
1.00 
2.80 


PLO 1.93 | ING 
VIF 0.87 | CED 
ING 4.93 | yap 
EST 3.67 | ITE 
WHI SUL 2.33 
sou NUW 0.40 
FRO | MEL 2.47 
UND DOK : 

VER 


00 te 
~~ 


Nm bm bh 


uN wv 


wear wets w Ww 


a> => 
oo 


3. 
eX NOP ). 
CHI 3.1. som | 2.8 
ENT * | 
JUM 
ROP 
ISH 


- moe DO 
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tion technique seems somewhat differ- 
ent than, yet associated with, measures 
of association value, frequency, and 
meaning. The number of words that 
the subjects can produce that contain a 
syllable seems closely associated with 
the rate at which the syllable is learned. 


CONCLUSION 


Underwood and Schulz, defining fre- 
quency as the frequency that trigrams 
appear contiguously in English words, 


found pronunciability to be significantly 
related to rate of learning, while fre- 
quency was not a major variable. This 
experimenter defined frequency as the 
frequency that a trigram appeared as a 
discrete or relatively discrete sound unit 
in spoken English. Using the approach 
taken in this paper, frequency is most 
significantly related to rate of verbal 
learning while pronunciability is not a 
major variable. While it may be that 
other attributes of trigrams (such as 
which with the fre- 


m or a) covary 
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quency that trigrams occur as sounds in 
the English language are more basic in 
influencing rate of verbal learning, one 
can say, with certainty, that frequency, 
as defined herein, is an adequate pre- 
dictor of rate of verbal learning. 
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As Stricker (1961) notes, throughout 
study we (Johnson, Thomson, & 
Frincke, 1960) equate word value with 
the rating of the word on the good-bad 
semantic differential. He 
further, that: 


our 


scale of the 


Says, 


word value has been used in the literature to 
represent a dimension different from that 
tapped by the evaluative scale of the seman- 
tic differential, and while the two may share 
some commonality, it does not seem entirely 
appropriate to use the latter in order to 
operationally define the former. 


Stricker then demonstrates that word 
value, as defined in previous visual dura- 
tion threshold studies, refers to a moti- 
vational (e.g., Allport-Vernon) and not 
an attitudinal (e.g., semantic differential ) 
dimension, and that motivational and at- 
titudinal dimensions of value( ?)or evalu- 
ation( ?) or “X” are probably not synony- 
mous. He says that while we demonstrate 
that there is a relation 
goodness and word frequency (and also 
between both goodness and frequency and 
threshold), we do not demonstrate a rela- 
tion between word frequency and word 
value (as Stricker prefers to 


between word 


define 
value). 

We cannot deny that the operational 
definition of “value” varied between our 
study and that used in previous studies 
conducted in this area. As we see it, 
science consists, to a large degree, of 
figuring out ways of testing the previ- 
ously untestable. If operationally defin 
ing black as white aids the scientist in 


this endeavor, then it is quite legitimate 
to do so, so long as you make your defi- 
nitions clear. While the word “value” 
has been used in the past to refer to other 
dimensions than those investigated in our 
study, we do not believe that we, or 
others, are bound to traditional usage, so 
long as we define our terms. We plead 
guilty to departing from prior usage of 
the word “value,” but do not believe this 
to be reprehensible. 

We believe that our shift in operational 
definition of “value” is defensible on sev- 
eral grounds. The first is that so long 
as value is defined in terms of such di- 
mensions as Allport-Vernon scores, then 
it appears to have been clearly demon- 
strated (Postman & Schneider, 1951; 
Solomon & Howes, 1951) that the two 
variables of word frequency and word 
value are so inextricably intertwined that 
experimental tests of the influence of fre- 
quency and of value on threshold yield 
results that are difficult, if not impossible, 
to interpret in any unequivocal sense. By 
redefining value, we believe that we have 
made the value-frequency problem more 
easily testable than was previously the 
case. If we have, then a redefinition 
seems legitimate. If Stricker prefers to 
use value in another, more traditional 
sense, it is his privilege to do so, and his 
responsibility to design an experiment 
testing the influence of value, as he de- 
fines it, on frequency and on threshold. 


A second defense of our shift in opera- 
tional definition of value also comes from 
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the ground rules of science. It is gener- 
ally believed that a major aim of science 
is to stimulate further scientific endeavor, 
thus adding to our knowledge. It seems 
likely that many experimenters were dis- 
couraged from entering the value-fre- 
quency controversy, since after the Post- 
man and Schneider and the Solomon and 
Howes studies cited above it became 
fairly apparent that determining the in- 
fluence of value on threshold was difficult, 
if not impossible, so long as value was 
defined in terms of Allport-Vernon 
scores. As a result, little experimentation 
was done in this area, and for a number 
of years the frequency explanation was 
(quite correctly, on the basis of the avail- 
able evidence) generally accepted as 
being the most parsimonious one avail- 
able in dealing with the results of visual 
duration threshold experiments. Recent 
experiments (Johnson, Frincke, & Mar- 
tin, 1961; Newbigging, 196la, 1961b) 
stimulated, in part, from the study and 
from the definition of value with which 
Stricker finds fault, suggest that fre- 


quency is not a significant variable, while 
value, as we define value, is significant 
(although there is some question as to 


why it is significant) in determining 
threshold. 
We cannot believe that any redefinition 


of word value that serves to increase the 
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testability of certain experimental ques- 
tions, generates research, and provides us 
with a different and, we believe, more 
correct view of reality, is to be rejected 
merely because it violates traditional 
usage. 
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Recently, Levine (1959a, 1959b) de- 
scribed a mathematical model for hy- 
pothesis behavior in discrimination learn- 
ing-set defining a 
hypothesis (H) as “a specifiable pattern 
stimulus set” 
certain 


experiments. By 


of response to a selected 
(1959b, p. 353), and making 
assumptions regarding their nature and 
conditions of occurrence, Levine was able 
to evaluate the strengths of various hy- 
potheses (e.g., position preference, stimu- 
lus preference, problem solution) through- 
out learning-set formation, to analyze 
learning-set curves in terms of the con- 
tributions of component Hs, and to syn- 
given H 
(1959a, 


surves 
dissertation 


thesize learning-set 
strengths. In his 


p. 58), Levine suggested that this ap- 


proach might also be appropriate to a 
description of the finding of North 
(1950), Pubols (1957), and others, that 
rats show progressive improvement in 
serial position reversal learning. 

[he published model is not directly 
applicable to serial reversal learning 
data, but since its appearance Levine * 
has developed a general model for hy- 
pothesis behavior, applicable to both 
learning-set and serial reversal experi- 
ments. A recent experiment by the 
writer (Pubols, 1962) yielded data in 
terms of which this model might be 
tested. 

In this experiment, three groups of 10 
were given 410 trials in a 


rats each 
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one-unit alley Y maze. Members o: 
Group 10 received 10 trials per reversal 
for original learning and 40 reversals 
of a position habit; members of Group 
20 received 20 trials per reversal for 
original learning and 20 reversals; and 
members of Group 40 received 40 trials 
per reversal for original learning and 
10 reversals (in all cases, only 10 trials 
were given on the final reversal, as this 
sufficed for measurement purposes). Em- 
pirical results of the study have been 
reported elsewhere (Pubols, 1962). Here, 
we will be concerned with the applica- 
Levine’s model to the serial 
data. But first, a brief sum- 
the model seems in order. 


tion of 
reversal 
mary of 


SUMMARY OF LEVINE’S GENERAL MODEL 


Levine’s model is restricted to the 
operation of H behavior over the first 
three trials of a problem. He makes the 
following assumptions: (a) if an H oc- 
curs on a problem, it will be manifest 
over all three trials of that problem; (0b) 
while the determiners of response on 
Trial m may include stimuli associated 
with Trial #—1 (as in single alterna- 
tion), the response will be independent 
of Trial n—2 stimuli (thus excluding 
from consideration an H such as double 
alternation); (c) all Hs are mutually 
exclusive; (d) the sum of the proba- 
bilities of all operative Hs is one; (¢) 
the selection and definition of Hs for a 
given application is independent of the 
characteristics of the model; and (f) 
“a given H has the same strength no 
matter in which cell [sequence] it ap- 
pears” (Levine, 1959b, p. 358). 

The operations involved in estimating 
H strengths differ somewhat for learn- 
ing-set and for serial reversal data, and 
hence these will be considered separately. 
For any three-trial problem in the learn- 
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ing-set situation, there are four reward 
sequences, four response sequences, and 
two conditions of Trial 1 outcome (re- 
warded or nonrewarded). This yields 
32 possible sequences, or cells. The pro- 
cedure for evaluating H strengths basi- 
cally involves two steps. The first step 
utilizes a simplifying assumption, which 
may be called the assumption of sym- 
metry: in a _ learning-set experiment, 
each problem involves a new pair of 
discriminanda, and hence it is reasonable 
to assume that, on each problem, the 
probability of a Trial 1 positive sequence 
equals the probability of a Trial 1 nega- 
tive sequence: 


P(+, y= P(—}) 


This being the case, the first step in- 
volves the calculation of conditional 
probabilities of empirical sequences, the 
probability of the empirical outcome on 
Trials 2 and 3, given the reward se- 
quence and outcome on Trial 1. Opera- 
tionally, the frequency of occurrence of 
a given sequence is divided by the fre- 
quency of occurrence of all 
having the same reward sequence and 
The second step then 


sequences 


Trial 1 outcome. 
involves the simultaneous solution of N 
equations, representing the N hypotheses 
whose strengths are being evaluated, 
taking account differences in the 
number of cells associated with each H. 
The final result, then, is a statement of 
the probability of each H in each block 
of problems analyzed. 

Turning now to serial reversal data, 
the total number of reduces 
from 32 to 8. This from the 
fact that there is only one reward se- 
quence that is, on a given 
reversal, reward is always on the same 
side or position. Further, response se- 
quence and reward outcome become 
completely confounded. Defined in terms 
of reward outcomes, the eight serial re- 
sequences are, for Trial 1 nega- 
tive: —-—-, —+, —t and 
—-++; for Trial 1 positive: +++, 
++--—,+-—+, and +— 

In the serial reversal (and 
also, incidentally, oddity learning) ex- 


into 


sequences 
follows 


involved ; 


versal 


case of 


periments, the assumption of symmetry 
is not met. As serial reversal learning 
progresses, the probability of a Trial 1 
negative sequence increases while that 
of a Trial 1 positive sequence decreases ; 
hence, an H will come to have a greater 
strength in the former case. When the 
assumption of symmetry is not met, 
Levine * has shown that the use of con- 
ditional probabilities of empirical out- 
comes is not appropriate; rather absolute 
probabilities should be calculated. 

Thus, for the evaluation of serial re- 
data, absolute probabilities of 
sequence occurrence should be utilized. 
The absolute probability of a given se- 
quence is defined as the ratio of the fre- 
quency of that sequence to the frequency 
of all sequences: 


versal 


P (ith sequence) 
frequency of ith sequence 


frequency of all sequences 


APPLICATION OF THE GENERAL MODEL 


The analysis of H behavior for the 
data of the Pubols experiment is based 
on reversals only, not on the original 
learning. Reversals were grouped into 
blocks of five, yielding 50 scores per 
block (5 reversals X10 rats). This 
gives, for Groups 10, 20, and 40, re- 
spectively, eight, four, and two blocks 
for analysis. Table 1 outlines the hy- 
potheses chosen for analysis, along with 
other relevant information. Equation 1 
now becomes: 


P (ith sequence) 


frequency of ith sequence 


50 


After these sequence probabilities are 
computed, r, a, b, fo, and ps are ob- 
tained in the usual and the final 
H probabilities are defined as R=8r, 
A = 2a, and B = 2b, ps and ps remaining 
unchanged. 

Results of the application of the model 
are shown in Figures 1 and 2. Figure 1 


way, 


‘Personal communication, January 11, 


1961. 
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TABLE 1 


THI 


Probability 
symbol 


Hypothesis 


Random responding R 
Position preference A 
Position alternation B 


Trial 2 problem solution pz 


Trial 3 problem solution 


presents the experimental data as ana- 
lyzed into the various Hs. Inspection 
of the figure reveals an interesting hap- 
pening. Whereas all three groups show 
interreversal learning (progressive in- 
crease in fo+ pg), this learning is cor- 
related, in different groups, with the 
diminution of different “incorrect” Hs. 
Thus, for Group 10, interreversal learn- 
ing is accompanied by a systematic de- 
alternation, B; for 
a decrease primarily 
and for Group 
which 


crease in position 
Group 20, there is 
in random responding, RK; 
40, 
systematically decreases. 

In order to obtain some indication of 


it is position preference, A, 


HYPOTHESES EVALUATED IN 


rHE PRESENT ANALYSIS 


Number of 


Associated sequences 
“3 1 . sequences 


All sequences 
Sey tT 
a = 
ee 


the reliability of the trends just de- 
scribed, each curve of Figure 1 was sub- 
jected to an individual trend test.° For 
these tests, H strengths in each rat on 
each block of reversals were computed, 
and the error term was the blocks by 
subjects interaction. As expected, in 
Group 10, the trend in the problem solu- 
tion curve was significant (p < .01), as 
was the trend in the position alternation 
curve (p< .00L). For Group 20, only 


5 Because of different numbers of reversal 
blocks in the three groups, and 
within an individual subject all hypotheses 
are not statistically independent, it was not 
possible to run any over-all trend tests. 


because 





PROBLEM SOLUTION (p2+ Ps) 
i i i i i 1 





~—a 
RANDOM RESPONDING (R) 
Pit wees ena eaten ales tases ate att, 





T T T T T t tT i. 


“ae 


POSITION PREFERENCE (A) 
i ; eee | i 1 io Pa 





T — ™ © ' tT tT 








POSITION ALTERNATION (B) 





ine". ©. 2 
REVERSAL BLOCK 


Fic. 1. 


' sv 2's o'r Ss 


Hypothesis probabilities as a function of reversal block. 





THEORETICAL Notes 





@ 
° 


to.) 
°o 





@—eEMPIRICAL 


ey 
° 


O- -OSYNTHESIZED 








PER CENT CORRECT 


nN 
°o 


GROUP 10 


i i 


1 





i 


s = 


GROUP 20 GROUP 40 


i i i ” i 














'!23 45 


6 7 8 


'ees ' s 


REVERSAL BLOCK 


Fic. 2. 


Comparison of empirical and synthesized serial reversal functions, in terms of 


Trial 2 percentage correct. 


the trend in the problem solution curve 
was significant (p < .05), and for Group 
40, only the trend in the position prefer- 
ence curve significant (p< .05). 
The failure of significance in the case 
of the Group 20 random responding, and 
the Group 40 problem solution curves 
can probably be attributed to the loss in 
degrees of freedom resulting from the 
combination of reversals into blocks. 
Certainly, in the case of Group 40, there 
is independent evidence (Pubols, 1962) 
that interreversal learning was genuine. 

It seems that, as the number of trials 
per reversal the initially 
dominant hypothesis changes from posi- 
tion alternation, possibly to random re- 
sponding, and then to position preference. 
rhe strengths of these Hs were, in the 
indicated groups, high from the 
first reversal, suggesting the importance 
of the number of trials on the original 
In the case of Group 40, the 


was 


increases, 


very 


problem. 
position preference hypothesis manifests 
itself exclusively in the sequence, — — —, 
which indicates strong perseveration of 
the formerly correct With 
fewer trials per reversal, this persevera- 
tive and incorrect 
responding is manifest in an alternation 


response. 


effect is less strong 
or a random pattern. 
Figure 2 compares conventional (em- 
pirical) serial reversal curves based on 
Trial 2 performance with curves syn- 
thesized according to Levine’s Equation 


6 (1959b, p. 363). The fit is seen to 
be best for Group 10, poorest for Group 
40. Trial 3 curves were also synthesized, 
but the fit in all three 
cases, primarily by way of overestimation. 


was less close 


DISCUSSION 


The general model to have 
as much utility when applied to serial 
reversal learning when applied to 
learning-set formation. Where the prob 
ability of a Trial 1 negative sequence 
equals the probability of a Trial 1 posi- 
tive sequence, the strength of Hs may be 
evaluated in terms of either absolute or 
conditional probabilities of empirical out 
comes. In contrast, where Trial 1 posi- 
tive and negative sequences 
equal probabilities, the strength of Hs is 
appropriately evaluated only in terms of 
absolute probabilities. 

There yet remains 
between the model and 
This discrepancy involves the as- 


appears 


as 


have un 


one discrepancy 


serial reversal 
data, 
sumption that a given H has the same 
strength in all cells in which it appears. 
This assumption implies, for example, 
that P(— — —) = P (+++), and that 
P(—+-—)=P(+—+). That these 
equalities do- not hold follows from the 
fact that P(—,)>P(+,). However, 
the analytical utility of the model, as 
indicated by Figure 1, and the close 
correspondence between empirical and 
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indicated by 
this 


synthesized functions, as 
Figure 2, suggest that, in practice, 
may not be a serious shortcoming. 


Levine (1959b, pp. 365-366) compares 


explanations of “learning to learn” that 


have been suggested in the literature. 
Harlow’s view (1959; see also Harlow 
& Hicks, 1957) is that interproblem (and 
presumably interreversal) learning in- 
volves the weakening and eventual elimi- 
nation of incorrect response tendencies, 
or, in the present context, Hs not con- 
sistently An opposing view 
(Restle, 1958) emphasizes the strength- 


(Hs). 


increases 1n 


reinforced. 


ening of rewarded responses 
While our data that 
fo and ps are accompanied by decreases 
in the strength of other Hs, it cannot be 
concluded, as Harlow implies, that the 
latter is the cause of the former. Nor, 
of course, can the reverse be concluded. 
We are in with 
(1959b, pp. 365-366) that, in terms of 
the model for hypothesis behavior, learn- 
best 
certain Hs 


show 


agreement Levine 


described as the 
through 


ing to learn is 
strengthening of 
100% reinforcement and the weakening 
Hs through 50% 


extinction) of other 


reinforcement. 
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A COMPARISON OF THE POWER OF THE U AND ¢t TESTS’ 


C. ALA 


N 


BONEAU 


Duke University 


In a recent paper (Boneau, 1960), the 
author summarized the results of a num- 
ber of theoretical and empirical studies 
dealing with the effects of violations of 
assumptions underlying the ¢ test. It was 
concluded that the ¢ test is remarkably 
unaffected by the two common violations : 
sampling from populations having un- 
equal variances and sampling from non- 
normal distributions. One who uses the 
t test can reasonably sure that the 
probability of rejecting a true null hy- 
pothesis is close to the alpha value he 
selects for his experiment even though he 
may have misgivings about the assump- 
tions upon which the ¢ test is based. As 
a result of these considerations, a recom- 
mendation made in the previous 
paper to the effect that even when the 
assumptions are not met (except under 
special conditions) the ¢ test and the F 
test on means of the analysis of variance 
be used without those attendant feelings 
of turpitude which can be attributed to 
an introjection of the strictures of the 


be 


was 


proponents of nonparametric methods. 
This recommendation was based upon 
an assumption that the ¢ and F tests, be- 
cause they make effective use of the in- 
formation in the sample and have other 
desirable properties, should be more pow- 
erful techniques than nonparametric com- 
petitors. (For readable explanations and 
of the of a test 
among others Siegel, 1956 or Walker & 
1953.) That to 


hypothesis is false—if there 


discussions power a see 


Lev, is say, if the null 
are true dif- 
ferences between means—the ¢ test should 
signal the detection of small differences 


by yielding significant results more fre 


1 This project was supported by NSF 
Grant G-9592 to the author and by NSF 
Grant G-6694 to the Duke University Digital 
Computing Laboratory. The author wishes 
to express appreciation to Thomas M. Gallie, 
Director of the Laboratory, for his coopera- 
tion and assistance. 
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quently than should comparable non- 
parametric methods, 

Theoretically, the ¢ test is more power- 
ful than any of the usually utilized tests 
when the assumptions underlying it are 
met. It is also true, however, that when 
sampling is from certain nonnormal distri- 
butions, other tests may be more powerful 
than the ¢ test. The Wilcoxon-Mann- 
Whitney U test (Wilcoxon, 1945; Mann 
& Whitney, 1947) for example, in one 
pathological theoretically infi- 
nitely more powerful than the ¢ test. 
Theoretically also, the power of the U 
test is never less than .83 of that of the 
t test (Hodges & Lehman, 1956). In fact 
even in the case for which the ¢ test is 
designed (normality and equal variances ) 
the U test by one measure of the relative 
power of the two tests is 95% as power- 
ful the ¢ test (Hodges & Lehman, 
1956). 

Such theoretical statements about rela- 
tive power of tests are based upon mathe 


is 


case, 


as 


matical limiting processes inyolving con- 
ditions which are not of 
most practical situations, 
infinitely large sample sizes and arbi- 
trarily small differences between popula 
tion means: (Dixon, 1954; Hodges & 
Lehman, 1956; and Mood, 1954). State- 
ments as to the relative efficiency in gen- 


representative 
for example, 


eral of various nonparametric competitors 
of the ¢ test are scattered throughout the 
literature (Dixon, 1954; Hodges & Leh 
man, 1956; Lehman, 1953; Mood, 1954, 
to mention only the relatively accessible 
ones). They seem not, however, to have 
permeated effectively that hard core of 
statistical which the 
chologist musters in an attempt to wrest 
truth from chaff. 

The present paper is intended to pre- 
sent the facts (culled from the literature 
as well as manufactured for the purpose) 
about the power of the ¢ test and, in par- 
ticular, how that power compares with 
the power of a specific nonparametric 


lore research psy- 
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competitor in various practical situations. 
This presentation is meant to temper in 
part the implications of the previous 
paper (Boneau, 1960) that the ¢ test 
should be used whenever possible. It 
would seem that here, as in other areas 
of human endeavor, a little discretion 
may pay off. 

Attention will be focused upon the U 
test, a worthy protagonist whose princi- 
pal strengths vis-a-vis the ¢ test have al- 
ready been mentioned. The U test, or 
equivalent versions of procedures based 
upon ranked scores, has been invented 
several times in the history of statistics, 
first by Deuchler (1914), but later by 
Wilcoxon (1945), Mann and Whitney 
(1947), among others. (See Kruskal, 
1957, for historical discussion.) As used 
in the present context, the statistic U is 
computed by determining the number of 
scores in the second sample which are 
exceeded by each score in the first sam- 
ple. The sum of all such counts summed 
over the scores in the first sample is 
called LU’, tables for which have been de- 
veloped by Mann and Whitney (1947), 
extended by Auble (1953), and made 
readily accessible by Siegel (1956). Wil- 
coxon’s T test (1945), although limited 
to equal-size samples, gives exactly 
equivalent results even though it is com- 
puted in a different fashion. 

The null distribution of U may be de- 
rived from the assumption that ranks are 
assigned to the two samples on a random 
such that every combination of 
ranks among samples is equally likely. 
For example, given that n,, the first sam- 
ple size, is 2 and n, is 3, 


basis 


° the possible 
values of the two ranks in the first sam- 
ple (assuming no ties) are 1 and 2, 1 and 
3, 1 and 4, 1 and 5, 2 and 3, 2 and 4, 2 
and 5, 3 and 4, 3 and 5, and finally, 4 and 
5. By the definition above, these lead to 
U values of 6, 5, 4, 3, 4, 3, 2, 2, 1, and 0, 
respectively. If all of these combinations 
are equally likely—the two samples came 


from the same distribution for example— 
the probability of getting values as ex- 
treme as 6 or 0 by chance is the sum of 
the individual probabilities : 


1/10 + 1/10 = 1/5 
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By the logic of statistical decisions, how- 
ever, one attributes extreme values of U 
not to these accidents of random sampling 
which occur with known probability if 
the null hypothesis is true, but to actual 
differences between the distributions. In 
actuality, differences in distribution usu- 
ally lead to non-equally-likely combina- 
tions of ranks. Ranks 1 and 2 occur 
together with relatively greater frequency 
as the difference between the means of 
the populations increases and as a result 
more frequent extreme U values occur. 
For this reason U is generally considered 
to be a test of displacement or shift of 
distributions, the main focus being on 
differences in central tendency. Note 
also that discrepancies of variance tend 
also to produce non-equally-like combi- 
nations of ranks. If two populations 
have the same mean, the values 1 and 5 
tend more to occur together as the size 
of the variance of the first increases rela- 
tive to the size of that of the second. 
However, these combinations give rise to 
middling values of U and hence are ig- 
nored, in effect, by the U test. We shall 
see, however, that there are cases for 
which differences in distribution other 
than central tendency affect the value 
of U. 

One further statement might be made 
about the sensitivity of the U test in spe 
cific cases. Since essentially it is based 
only upon ranks, first and last scores get 
rank 1 and m whether they are close to 
the mean or several standard deviations 
away. Thus the occasional score which 
is apparently not in the distribution but 
which furnishes no real justification for 
exclusion is treated as a member of the 
ingroup by the U test but as the pariah 
it may well be by the ¢. 

The method of the present paper is 
quite similar to the approach followed by 
the author in the previous paper dealing 
with the probability of rejecting the null 
hypothesis if it is true. In that study 
populations having specified characteris- 
tics were constructed and the values of 
t arising from the differences between 
means of random samples drawn from 
them were computed. The empirical 
probability of rejecting the null hypothe- 
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sis was obtained by determining the pro- 
portion of sample t values falling outside 
the ordinary tabled values for the appro- 
priate number of degrees of freedom (i.e., 
falling in the region of rejection). Since 
the null hypothesis was indeed true, these 
empirical probabilities or proportions 
could be compared with the nominal val- 
ues to determine the effects of modifying 
the specified characteristics of the popu 
lations in such a way as to violate the 
assumptions underlying the ¢ test. In the 
present study, the concern is with the 
proportion of obtained ?’s and Us falling 
in the region of rejection (or critical re- 
gion) when the null hypothesis is false— 
when there is a built-in, specified differ- 
ence between To generate this 
information, the only required addition 
to the previous ¢ test program is a pro- 
vision for changing the mean of the first 
sample to any value desired. 

The program using the IBM 650 Elec- 
tronic Data Process System for generat- 
ing t’s from random deviates was dis- 
cussed in detail in the previous paper 
(Boneau, 1960). To summarize briefly, 
10-digit random numbers were generated 
by a multiplication process. These were 
converted into random deviates from a 
specified population by a_table-look-up 
procedure; the random deviates were in- 
jected into the computing formula for 
the ¢ test for the difference between 
means of two independent samples; and 
the resulting ¢ value sorted and 
tallied on an internally contained table 
within the computer. 

The program designed for the present 
study for the U test utilized the existing 
random number and random deviate gen 
erating procedures. The value of U was 
computed by the simple expedient of sub- 
tracting every score in the first sample 
from every score in the second sample 
and counting the number of minus signs 
which resulted. This number is U by 
definition. The possibility for a tie was 
ignored since the expected rate of ties 
was approximately one per thousand Us, 
a rate which, while significant, would 
seem to have little effect on the observed 
results. As in the ¢t program, the obtained 
U values were sorted and tallied on an 


means. 


was 
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internally contained table which was 


punched out in card from when the de- 
sired number of Us from the specified 
populations was reached. 


RESULTS 

Since the results of the study will be 
expressed in terms of empirical power 
functions, the investment of a_ small 
amount of space to elaborate on the 
method of their determination may be in 
order at this point. 

For any given set of conditions, i.e., 
combination of means, variances, and dis- 
tributions, the result of the computer pro- 
cedure is a set of #’s or Us which may 
be arranged in a frequency distribution. 
Figure 1 shows two such distributions. 
The distributions are of ¢’s obtained on 
the basis of sample sizes of 15 from nor 
mal populations having a variance of 1.0. 
One of the distributions shown, that cen- 
tered around the ¢ value of zero, arose 
when the means of the two samples were 
both equal to zero. For the other distri- 
bution, that centered around 2.8 and 
shaded, the difference between the means 
of the two samples was equal to 1.0, that 
is, equal to one standard deviation. The 
vertical lines divide the range of possi- 
ble ¢ values into two regions. The un 
hatched region marks off those values of 
t which result in a decision to accept (or 
fail to reject) the null hypothesis at the 
.O5 level. The hatched regions, those 
values of ¢ which result in a decision to 
reject the null hypothesis at that level. 
As can be seen, when the null hypothesis 
is true most of the obtained t’s fall in the 
region of acceptance. The proportion 
which fall in the region of rejection, in 
this case .049, we have called the empiri- 
cal alpha level. 

Sometimes a decision is made to accept 
a false null hypothesis, the so-called beta 
error. In Figure 1, such an error would 
be made in the case of those values of t 
from the shaded distribution which fall 
in the region of acceptance. On the other 
hand, decisions to reject the null hypothe- 
sis when it is indeed false occur for that 
proportion of the ¢#’s from the shaded dis- 
tribution which fall in the region of re- 
jection, in this case .737. It is this pro- 
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tion for the .05 level of significance. ) 


portion which we call the empirical power 
of the test under the given conditions. It 
should be noted that both the empirical 
alpha and the empirical power are esti- 
mates of the exact theoretical values 
which obtain for these conditions. 

The data of the present study are 
empirical power values considered as 
functions of the actual difference between 
population means. We shall consider 
separately, the functions for the one- and 
two-tailed tests and for the .05 and .01 
The figures which will 
be presented will, for a given set of con- 
ditions, depict such functions for both 
that ¢ and U tests so that com- 
parisons may be made. 

One further thing should be said about 
nomenclature before results are presented. 
\s in the previous paper, the conditions 
of sampling will be symbolically repre 
sented. For example, N(2,1)5-N(0,1)15 
indicates that the first sample is from a 
normal population N with a mean of 2 
and a variance of 1, the sample size being 
5. In this instance, the second sample 
is from a normal distribution having a 
mean of 0, a variance of 1, and the sample 
size is 15. In the text and figures, we 


will use the variable “x” for the value of 


values of alpha. 


visual 


Empirical ¢ distribution for mean difference of (a) 0.00 (unshaded area) and 


(The hatched area is the region of rejec- 


the first mean to indicate that it takes on 
the several values necessary for the 
ft) — pf. Values on the abscissa. In all 
cases the value of the mean of the second 
distribution is zero. One thousand Us 
and t’s were obtained for each condition. 


Normal Distribution: Homogeneous 
Variances 

First to be considered are the cases 
in which the sizes of both samples are 
the same. Figure 2 depicts the vari- 
ous empirical power functions obtained 
when the condition of sampling is 
N(x,1)5-N(0,1)5. Because of the dis- 
crete nature of the U distribution, exact 
alpha values of .05 and .01 are not pos- 
sible. When both samples sizes are 5, the 
following alpha values obtained from 
tables are used: .056 for the two-tailed 
test and .048 for the one-tailed test in 
place of .05, and .008 for both the one- 
and two-tailed test in place of .01. 

A number of interesting facts can be 
observed in Figure 2. First is the ex- 
pected superiority of the one-tailed test 
to the two-tailed test. Of prime impor- 
tance for the present paper is the remark- 
able lack of difference in power of the 
t and U tests over the range of values 
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M,-Me 
Fic. 2. Empirical power functions for U 
and ¢ tests with sampling scheme N (x,1)5- 
N(0,1)5. 


Only in the case of the two- 
008 level t test 


presented. 


tailed at the does the 


seem to exhibit a definite superiority and 
here only for gross difference between 


In most of the other cases, 
the obtained 
margin of 
sampling error (i.e., not significantly dif- 
ferent). This would indicate that al- 
though the ¢ test is theoretically a uni- 
formly (over all mean differences) most 
powerful test, the margin over the U test 
is not very much. Because of this power 
property of the ¢ test under these condi- 
tions, the points at which the U test 
shows a superiority must be attributed 
to sampling error. It is also clear that 
the alpha values (y, —p,=0) are vir- 
tually identical and are approximately the 
theoretical values expected. 

Increasing both sample sizes from 5 to 
15 tends, as shown in Figure 3, to in- 
crease the power of the test, but to leave 
virtually unaffected the things which we 
have said about the smaller sizes. For 
example, at the .05 level for a one-tailed 
test, the power of both tests for a differ- 
ence between means of 1.0 is around .42 
for sample sizes of 5, and .85 for samples 
of size 15. 

Still maintaining the condition of sam- 
pling from a normal distribution with 
equal variances of the parent populations, 
we may allow the sample sizes to be dif- 
ferent, 5 as opposed to 15, and generate 
another series of curves, Figure 4. The 
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points 


between 


are well within the 
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Fic. 3. Empirical power functions for U 
and ¢ tests for sampling scheme N (x,1)15- 
N (0,1) 15. 


results are similar to those preceding, 
with the power for the one-tailed test at 
the .05 level and a mean difference of 1.0 
being in this case about .60. This, too, 
is to be expected since the power of the 
test among other things is an increasing 
function of the difference between the 
means but a decreasing function of the 
expected standard error of the difference 
between means where 


02,—2, = Va;7/m, + o2"/ Me 


Figure 4 shows more clearly than any of 
the preceding graphs the superiority of 
the ¢ test to the U test, but the largest 
obtained difference is only .12, and being 
the largest probably overestimates the 
true difference. 

The results we have observed thus far 
are for conditions in which the assump- 


Fic. 4. Empirical power functions for U 
and ¢ tests for sampling scheme N (x,1)5- 


N (0,1) 15. 
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tions underlying the ¢ test are satisfied. 
Under these conditions we know that 
theoretically the ¢ test is the most power- 
ful test of the difference between two 
means. The power functions and rela- 
tions between the powers of the two tests 
obtained in this study are comparable 
with those found in several sources 
(Dixon, 1954; Dixon & Massey, 1957; 
Ferris, Grubbs, & Weaver, 1946, to name 
afew). Thus they contribute only a con- 
firmation of the general method and at 
the same time furnish a graphical demon- 
stration of the fact that, while less power 
ful than the ¢ test, the U test performs 
quite well in those situations for which 
the ¢ test is expressly suited. 

When the assumptions of the ¢ test 
are not fulfilled, it is not necessarily a 
most powerful test. Moreover, as stated 
earlier, there are theoretical conditions 
for which the ¢ test is considerably less 
powerful than other tests including the 
U test. The remainder of this paper will 
compare power functions for the U and t 
tests for those violations of assumptions 
which arise from various combinations of 
the three distributions, the two variances, 
and the two sample sizes which have been 
selected for the study. We will determine 
whether the power functions for the ¢ test 
show any drastic deviations from those 
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power functions we have already seen, 
and we shall discover those cases, if any, 
for which the U test performs better than 
This, of course, must be 
imposed by the 


does the ¢ test. 
within the limitations 
selection of conditions. 


Normal Distributions: Heterogeneous 
Variances 


Initially, we shall proceed by examin- 
ing the effect of violating the assumption 
of nomogeneity of variances with normal 
distribution. In the previous study, it 
was determined that inequality of vari- 
ances up to at least a ratio of 1 to 4 pro- 
duced a very little effect in alpha provided 
the sizes of the two samples are the same. 
If the sample sizes are different, gross 
disturbances in alpha occur. 

Figure 5, depicting the equal-sample- 
size case (n= 5), reveals that for ¢ the 
alpha is relatively undisturbed and power 
also seems little affected. It should be 
noted that the relatively low power for 
Figure 5 (and for some following fig- 
ures) for a given mean difference is to 
be attributed to a change in the standard 
error of the difference between means 
occasioned by the change in go, from 1 
to 4. For a given mean difference in 
standard error units, the power remains 
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essentially the same. For example, the 
true standard error of the difference be 
tween means for the N(x,1)5 
N(04)5 is .63, while for the 
(x,1)5-N(0,4)5 it is 10. A mean dif 
ference of 2.0 (2 standard errors) in Fig 
ure 5 shows the power of the two-tailed 
t test at the .056 level to be approximately 
45. On Figure 2, a mean difference of 
1.26 (2 standard errors) shows the com 


case 


case 


parable power to be approximately .44. 
In terms of the true standard error of the 
difference between sample means, then, 
the power of the ¢ test seems relatively 
unaffected by violating the homogeneity 
of variances assumption. 

Likewise, the power of the U test is 
maintained, but at a level again slightly 
less powerful than the ¢ test. Similar 
results (not shown) obtain when both 
sample sizes are changed to 15. 

Introducing heterogeneity of 
size as well as heterogeneity of variance 
produces discrepancies which might be 
predicted from what is already known 
about the effect of this condition on the 
alpha level. Figures 6 and 7 portray 
these effects for the two possible com- 
binations of variance and sampe 
considered, N(x,1)5-N(0,4)15 and 
N (x,4)5-N (0,1) 15 The 
first thing to observe in these figures is 
that effect on the alpha level (the power 
In Fig- 
all curves less than 


sample 


size 


respectively. 


for a mean difference of zero). 
ure 6 the alpha is for 





Fic. 6. Empirical power functions for U 
and ¢ tests for sampling scheme N (x,1)5- 
N (0,4) 15, 
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Fic. 7. Empirical power functions for U 
and ¢ tests for sampling scheme N (x,4)5- 
N (0,1) 15. 


the nominal values of .05 and .O1, al 
though the magnitude of the difference 
for the U curves is less than those for t. 
The actual values fol- 
lows: (one-tailed test-.05 level) ¢ = .010; 
U = .029:; (two-tailed test-.05 level) ¢ 

009; U=.027; (one-tailed test-.01 
level) ¢—=.002; U=.005; (two-tailed 
test-.01 level) ¢=.001; U=.004. The 
power functions reflect the fact that while 


observed are as 


nominally at a .05 level, the tests are 


actually operating at a reduced alpha 
value. At the reference value, a mean 
difference equal to two standard errors, 
the power of the two-tailed .05-level ¢ test 
is approximately .24, while the power for 
the comparable U test is about .28. Both 
of these values are much than the 
values of about .45 which obtained under 
the other conditions. We may conclude 
that the conditions of heterogeneity of 
variances and sample sizes which pro- 
duced Figure 6 have affected the alpha 
level of both U and t, U to a lesser ex- 
tent, however, than ¢. Since the alpha 
level for the U test under these conditions 
is greater than that for the ¢ test, the 
resulting power functions should and do 
reflect a superiority of U to t. What we 
are observing is the power curve for ¢ 
for an alpha level of about .01 and for 
U of about .03 when we consider, for 
example, the curves which were con 
structed on the basis of the normal bound- 
aries of the region of rejection for the 
.05 level of alpha. 

We may make exactly the same com- 


less 
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parisons for the results depicted in Fig- 
ure 7, but note that the effects are in the 
opposite direction to those found in Fig 


ure 6. For example, the alpha values 
obtained are all greater than the nominal 
values, and again the distortion to the 
t values is greater than those for l’. For 
this case—N (x,4)5-N (0,1) 15—the actual 
alpha values observed are: (one-tailed 
test-.05 level) ¢= .115;: U=.081; (two- 
tailed test-.05 level) t= .145; U = .064; 
(one-tailed test-.01 level) t= .048; U 
.020; (two-tailed test-.01 level) t = .058; 
U=.021. Considering the _ reference 
point used earlier, the empirical power of 
the two-tailed test with alpha equal to .05 
for a difference between means equal to 
two standard errors of the difference be 
tween means, we find these values to be 
75 for t and .52 for U. And as before, 
the power curves behave as if they were 
the power curves for the observed value 
of alpha, the curves for ¢ being consid 
erably above those for U at all points. 

We may conclude that violating the 
assumption of homogeneity of variance 
if the underlying distributions are normal 
has little effect on either the alpha level 
or the power of the ¢ or the U test as long 
as sample sizes are the same. The viola- 
tion of this assumption coupled with het- 
sample changes the 
ilpha level of both the ¢ and the U tests 
produces which 
seemingly are roughly appropriate for the 
true alpha level rather than the nominal 
one. The U test much 
turbed by this particular violation than 
does the ¢ test, but it is by no means true 
that the U test is completely unaffected 
as would seem to be implied by the term 
“nonparametric.” Rather it seems to be- 
have much as does the ¢ test, but some- 
what less sensitively to the violation of 
the assumption of homogeneity. 


erogeneous sizes 


and power functions 


seems less dis- 


Nonnormal Distributions 


At this point we will examine the em- 
pirical power functions for the ¢ and the 
U tests when sampling for at least one 
sample is from nonnormal distributions. 
In this way we will observe the effects, 
if any, on the functions if one or both of 


the parent populations is exponential, or 
if one or both of the parent populations is 
rectangular. From an examination of 
the empirical alpha for the ¢ test, we al- 
ready know what the effects of sampling 
from a rectangular distribution are mini- 
mal on alpha. When the exponential dis- 
tribution is involved, however, some per- 
turbations in alpha may occur because of 
differences in skewness of the two distri- 
butions. The magnitude of these dis- 
turbances as seen in the earlier study was 
progressively sample 
increased. At sample sizes of 25, the 
effect was virtually unnoticeable. 
These earlier observations are 
firmed in this study. Figure 8, for ex- 
ample, depicts the results of sampling 
from two rectangular distributions— 
R(x,1)5-R(0,1)5. Alpha is approxi- 
mately the correct value for both levels 
for both tests. In this figure, it appears 
that the power of ¢ is quite generally 
greater than that for U, but never by 
much except for the .01 level. Indeed, 
it would seem that for small differences 
between means, say, 0.5, U may be supe- 
rior to ¢. Similar results (not shown) 
were obtained when both sample sizes 


lessened as size 


con- 


were increased to 15, 

The next distribution to be examined 
occurs when both samples are taken from 
exponential distributions as in Figure 
9—E(x,1)5-E(0,1)5. The = empirical 
alpha values are in the appropriate 
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Fic. 8. Empirical power functions for U 
and ¢ tests for sampling scheme R(x,1)5- 


R(0,1)5. 
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Fic. 9. 
and ¢ tests for sampling scheme E(x,1)5- 
E(0,1)5. 


Empirical power functions for U 


Here 
ences between the 
seems consistently more powerful than 
the ¢ test, but this advantage disappears 


All in all 


again, for small differ 
means, the U’ test 


ranges. 


with greater mean differences. 


one may conjecture that when both dis- 


tributions have the same shape, even 
though not norma!, the power functions 
of the t and U tests have a relatively 
constant relationship. In most instances, 
the ¢ test is slightly more powerful than 
the Ul’ test, as we have observed in most 
of our examples. 

All of the foregoing power functions 
have come from combinations of distribu- 
tions which produce essentially symmetric 
distribution of ¢ and U when the null hy- 
pothesis is true (zero difference between 
means). We might expect more severe 
disturbances in power functions in those 
cases which, because of basic asymmetries 
in the observed ¢ distributions, exhibited 
discordant obtained alpha values in the 
previous study. As may be recalled, the 
asymmetric distributions arose when the 
two parent populations differed in skew, 
one normal, the other exponential, for 
example. It was true, however, that in 
creasing the sample size greatly dimin 
ished the asymmetries since the under- 
lying tend to 
normalize the distribution of t as sample 


probability mechanisms 
size increases. 

It is such conditions, probably not too 
uncommon in the experience (or at least 
the imagination) of the research worker, 
which motivate a desire to seek statistical 
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tools which exhibit fewer allergic 
tions to violations of assumptions. 

Because of these considerations, it is 
interesting to compare the power func- 
tions of the ¢ and the U test under such 
conditions of sampling. Figure 10 makes 
the comparisons for the case of exponen- 
tial and normal distributions with samples 
of size 5—E(x,1)5-N(0,1)5. First to be 
noted are the discrepancies in the ob 
tained alpha values for the one-tailed 
tests due to the asymmetrical distribution 
of ¢ and, surprisingly enough, for U 
The .048-one-tailed values are .022 for 
U, .014 for t. Likewise, the .008-one- 
tailed values are .002 for U, .001 for t. 
This is to be contrasted with the 
tailed values which are relatively close 
to the nominal values—the .056-two-tailed 
value for U is .049, for ¢ it is .068; and 
the .008-two-tailed value for U is .014, 
for ¢t, O11. These results for ¢ are com- 
parable to those found in the previous 
study. Thus we find that when consider- 
ing alpha, ¢ and U both are affected by 
sampling from populations which differ 
in skew, although it is possible that the 
effect on U is less than that on ¢. 

A further Figure 10 
reveals that the empirical power functions 
for U are, with one or two exceptions, 
higher than the power functions for ¢, 
but the advantage is slight. 

As mentioned previously, 
the sample size tends to normalize the 


reac- 


two- 


examination of 


increasing 


Fic. 10. Empirical power functions for U 
and ¢ tests for sampling scheme E(x,1)5- 


N (0,1)5. 
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Fic. 11. Empirical power functions for U 
and ¢ tests for sampling scheme E(x,1)15- 


N (0,1) 15. 


distribution of ¢ in the null case, the effect 
being to lessen the discrepancy of nominal 
and obtained values for the one-tailed 
test. Figure 11 presents the curves 
for the increased sample size case— 
FE (x,1)15-N(0,1)15. As before, our at- 
tention is first directed to the zero-mean- 
difference points, and as expected, the 
values for t have become closer to the 
nominal values—one-tailed test, .037 for 
05 level, and .006 for .01 level; two- 
tailed test, .044 for .05 level and .012 for 
01 level. Increasing the sample size to 
15, however, seem to im- 
prove the performance of the U test. For 
the one-tailed U test, the obtained alpha 
values are .016 for the .05 level, and .000 
for the .01 level. For the two-tailed test, 
the obtained alpha values again seem to 
be higher than the nominal values—.068 
for the .05 level, and .019 for the .01 level. 
In Figure 11, we note that the power 
functions for the U test are almost in- 
variably below those for the ¢ test. 

A notable phenomenon occurs for the 
.05-level, two-tailed U test. This particu- 
lar curve shows a decrease in power from 
0 mean difference to 0.25 before starting 
up again. While sampling error may 
well account for the dip, it is certainly 
possible for a test to be “biased,” the 
technical term for such an occurrence. 
In fact there is no reason to believe that 
the U test as a test for mean differences 
should not be biased. The U test is only 


does not 
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fortuitously a test for mean differences, 
being fundamentally a test for differences 
between distributions. As we have seen 
it is less sensitive to some kinds of differ- 
ences between distributions than others, 
being perhaps maximally sensitive to dif- 
ferences in central tendency of two dis- 
tributions. But there are many measures 
of central tendency, the mean and the 
median, for example. We know that the 
mean and median are different in skewed 
distributions. In the combination of dis- 
tributions with which Figure 11 is con- 
cerned, namely, exponential and normal, 
it is possible for medians to be the same 
for the two distributions which necessi- 
tates that the means be different and vice 
versa. For the ¢ test which explicity 
evaluates differences in sample means this 
seems to present no problem. The U test, 
however, may well be more sensitive to 
median differences and thus show the bias 
when used as a test for differences be- 
tween means. 

Other combinations of distributions 
which were explored in no way change 
the general picture which we have con- 
tinually observed. We may summarize 
that picture as follows: In general the 
t test is more powerful than the U test, 
but never by much. Based on the evi- 
dence we have seen, one might conjecture 
that over a long series of experiments 
involving distributions of the kind we 
have used, a consistent use of the ¢ test 
might result in, say, 5% more rejections 
of a false null hypothesis than would a 
consistent application of the U test. There 
are many other kinds of distributions 
arising in research, however, for which 
this statement need not apply. Depending 
upon one’s inclinations, such a conclusion 
could be interpreted as ample grounds 
for habitual use of either of the two tests. 
There are, as we have seen, other consid- 


erations. As a general rule we might 


say that the ¢ test seems to provide the 
appropriate power curve for the actual 
alpha level involved, whereas, the U test 
shows more variability in its power func- 
That is to say that if the actual 
(not the nominal) alpha is .05, the power 
curve for the ¢ test in most cases is prob- 


tions. 
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ably very similar to the power function 
for the theoretical case for the .05 level ; 
it is also to say that the statement is not 
nearly so true of the U test. This prop 
erty of the ¢ test is useless, 
unless we know what the actual alpha is. 


he ywwever, 


In cases when assumptions are violated 
we will not. It is true that the violations 
of the assumptions underlying the ¢ test 
can produce large discrepancies between 
nominal and actual alpha when the sam- 
ple sizes are other than nearly equal. 
This in itself contraindicates the use of 
the ¢ test in these instances. A further 
somewhat surprising consideration is that 
the U test is not truly distribution free. 
It is always sensitive to differences in 
distributions, and sometimes seems more 
affected by differences (other than mean 
differences) than is the ¢ test. 

If one final word is to be said it might 
be that one should not avoid using the 
t test (provided relevant considerations 
have been made) solely on the grounds 
that it is subject to error when assump- 
tions are violated and that the U test is 
not subject to error under the same con- 
ditions. Both of the statements are un- 
reasonable in the data. One 
should not, however, refrain from using 
the U test in place of the ¢ test on the 
grounds that it is considerably less pow- 
erful than the ¢ test. 
not true. 


view of 


This is simply 
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